Cinzia Ferrero
Neural networks for language and speaker recognition.
Rel. Pietro Laface, Sandro Cumani. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Del Cinema E Dei Mezzi Di Comunicazione, 2018
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial Share Alike. Download (5MB) | Preview |
Abstract
In this thesis we consider two major fields in which machine learning is applied to human voice: language and speaker recognition. For both we provide an overview of the whole recognition chain, from the acoustic signal to the classifier, and we present applications of neural networks for classification. In particular, since language and speaker systems share some techniques, the initial part of this thesis is an overview of the common approaches to the recognition problem. We first analyze state-of-the-art techniques to pre-process the speech signal, to extract its relevant features and to represent them by means of statistical models. We then focus on the working principles of neural networks, and on several different methods for their training and regularization.
Within the context of language recognition, we propose a neural network architecture to classify i-vectors, which are modelled on the basis of the recently presented Stacked Bottleneck Neural Network (SBN) features
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
URI
![]() |
Modifica (riservato agli operatori) |
