Data Collection and Generation for Preference Alignment in Speech Language Models

Mohamad Samaei

Data Collection and Generation for Preference Alignment in Speech Language Models.

Rel. Eliana Pastor, Alkis Koudounas. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025

PDF (Tesi_di_laurea) - Tesi
Accesso limitato a: Solo utenti staff fino al 12 Giugno 2027 (data di embargo).
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (1MB)

Abstract

Despite recent advances, models of Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) still misrecognize words in challenging conditions, limiting their ability. Reinforcement Learning from Human Feedback (RLHF) limits hallucinations in speech models by replacing purely statistical learning with a human-aligned optimization objective that rewards factual, grounded, and faithful outputs while penalizing hallucinated content. Current speech assistants are typically trained on proprietary data and use metrics such as Word Error Rate (WER) to prove their performance. At the same time, RLHF methods have largely focused on text-only models, leaving a gap in tools and datasets for applying preference alignment training to spoken dialogue systems.

This thesis addresses these gaps by presenting open-source implementation of (i) a data generation and extraction pipeline for conversational speech agents and (ii) an annotation platform for collecting human feedback, with the goal of enabling RLHF in speech models

Relatori

Eliana Pastor, Alkis Koudounas

Anno Accademico

2025/26

Tipo di pubblicazione

Elettronica

Numero di pagine

Corso di laurea

Corso di laurea magistrale in Data Science And Engineering

Classe di laurea

Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA

URI

https://webthesis.biblio.polito.it/id/eprint/38767

Modifica (riservato agli operatori)