Ahmadreza Farmahini Farahani
RobVC: An End-to-End Self-Supervised Voice Conversion.
Rel. Santa Di Cataldo, Francesco Ponzio, Alessio Mascolini. Politecnico di Torino, Master of science program in Computer Engineering, 2025
|
Preview |
PDF (Tesi_di_laurea)
- Thesis
Licence: Creative Commons Attribution Non-commercial No Derivatives. Download (4MB) | Preview |
Abstract
Current systems are mainly based on text-to-voice conversion rather than audio-to-audio voice generation, which results in an overall better final result but that lacks the speaker voice's characteristics. Moreover, the current voice generation approach struggles to preserve the emotions and the voice of the speakers, resulting in mechanical voices with a lack of intonation. Additionally, the majority of the models actually available for voice generation are too heavy to be used in a real-time system and, consequently, not usable for real-time purposes. Forbye, the few audio-to-audio systems available tend to generate mechanical, flat and emotionless voices and are not able to generalise (e.g.
a non-seen voice of the model needs a fine-tuning step before being correctly and precisely converted)
Relators
Academic year
Publication type
Number of Pages
Course of studies
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modify record (reserved for operators) |
