Cinzia Ferrero
Neural networks for language and speaker recognition.
Rel. Pietro Laface, Sandro Cumani. Politecnico di Torino, Master of science program in Cinema And Media Engineering, 2018
|
Preview |
PDF (Tesi_di_laurea)
- Thesis
Licence: Creative Commons Attribution Non-commercial Share Alike. Download (5MB) | Preview |
Abstract
In this thesis we consider two major fields in which machine learning is applied to human voice: language and speaker recognition. For both we provide an overview of the whole recognition chain, from the acoustic signal to the classifier, and we present applications of neural networks for classification. In particular, since language and speaker systems share some techniques, the initial part of this thesis is an overview of the common approaches to the recognition problem. We first analyze state-of-the-art techniques to pre-process the speech signal, to extract its relevant features and to represent them by means of statistical models. We then focus on the working principles of neural networks, and on several different methods for their training and regularization.
Within the context of language recognition, we propose a neural network architecture to classify i-vectors, which are modelled on the basis of the recently presented Stacked Bottleneck Neural Network (SBN) features
Relators
Publication type
URI
![]() |
Modify record (reserved for operators) |
