Human-Aligned Speech Language Models with Preference Alignment Data Collection

Vincenzo Montana

Human-Aligned Speech Language Models with Preference Alignment Data Collection.

Rel. Eliana Pastor, Alkis Koudounas. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

PDF (Tesi_di_laurea) - Tesi
Accesso limitato a: Solo utenti staff fino al 12 Giugno 2027 (data di embargo).
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (2MB)

Abstract

Preference alignment techniques have achieved remarkable results in aligning Large Language Models (LLMs) with human values through output comparison. However, these methods critically rely on human-annotated preference data, whose collection remains a major challenge due to scalability and consistency issues. The complexity further increases in the multi-modal domain, where annotators may focus on isolated aspects of a given modality (e.g., speech tone or rhythm) rather than its overall communicative intent. Positioned within this context, the present work specifically addresses these challenges in the speech domain. The primary goal is to collect human preference data on speech-based interactions, ensuring that annotators are properly guided to provide consistent and meaningful feedback.

To this end, a multi-stage speech pipeline, emulating a full conversation with a digital assistant, was designed