polito.it
Politecnico di Torino (logo)

RobVC: An End-to-End Self-Supervised Voice Conversion

Ahmadreza Farmahini Farahani

RobVC: An End-to-End Self-Supervised Voice Conversion.

Rel. Santa Di Cataldo, Francesco Ponzio, Alessio Mascolini. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview
Abstract:

Current systems are mainly based on text-to-voice conversion rather than audio-to-audio voice generation, which results in an overall better final result but that lacks the speaker voice's characteristics. Moreover, the current voice generation approach struggles to preserve the emotions and the voice of the speakers, resulting in mechanical voices with a lack of intonation. Additionally, the majority of the models actually available for voice generation are too heavy to be used in a real-time system and, consequently, not usable for real-time purposes. Forbye, the few audio-to-audio systems available tend to generate mechanical, flat and emotionless voices and are not able to generalise (e.g. a non-seen voice of the model needs a fine-tuning step before being correctly and precisely converted). This thesis proposes is creating an audio-to-audio voice generation using deep learning; the goal is to convert one speaker's voice to sound like another, maintaining content and and emotio

Relatori: Santa Di Cataldo, Francesco Ponzio, Alessio Mascolini
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 109
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: Technische Universität Darmstadt
URI: http://webthesis.biblio.polito.it/id/eprint/35238
Modifica (riservato agli operatori) Modifica (riservato agli operatori)