Ahmadreza Farmahini Farahani
RobVC: An End-to-End Self-Supervised Voice Conversion.
Rel. Santa Di Cataldo, Francesco Ponzio, Alessio Mascolini. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (4MB) | Preview |
Abstract
Current systems are mainly based on text-to-voice conversion rather than audio-to-audio voice generation, which results in an overall better final result but that lacks the speaker voice's characteristics. Moreover, the current voice generation approach struggles to preserve the emotions and the voice of the speakers, resulting in mechanical voices with a lack of intonation. Additionally, the majority of the models actually available for voice generation are too heavy to be used in a real-time system and, consequently, not usable for real-time purposes. Forbye, the few audio-to-audio systems available tend to generate mechanical, flat and emotionless voices and are not able to generalise (e.g.
a non-seen voice of the model needs a fine-tuning step before being correctly and precisely converted)
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
