Automated creation of Podcasts empowered by Text-To-Speech

Simone Sasso

Automated creation of Podcasts empowered by Text-To-Speech.

Rel. Antonio Vetro', Giovanni Garifo. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2022

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (5MB) | Preview

Abstract

The goal of Text-to-Speech (TTS) is to synthesize human-like speech from texts. Over the last decade, this research field has seen incredible improvements, thanks to the significant advances in deep learning and its extensive development. TTS models based on neural networks have been able to achieve results that are almost indistinguishable from human speech. Consequently, this technology has become more and more popular, drastically improving the way people interact with machines. Despite its current progress, neural TTS is far from a solved problem and still presents several criticalities. Both training and inference require heavy computational resources, and models tend to make mistakes when dealing with corner cases or text which belongs to a different domain with respect to the training set.

This thesis will examine the development of a pipeline for the generation of podcasts, by using a Text-to-Speech model to read news articles