polito.it
Politecnico di Torino (logo)

Neural networks for language and speaker recognition

Cinzia Ferrero

Neural networks for language and speaker recognition.

Rel. Pietro Laface, Sandro Cumani. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Del Cinema E Dei Mezzi Di Comunicazione, 2018

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial Share Alike.

Download (5MB) | Preview
Abstract:

In this thesis we consider two major fields in which machine learning is applied to human voice: language and speaker recognition. For both we provide an overview of the whole recognition chain, from the acoustic signal to the classifier, and we present applications of neural networks for classification. In particular, since language and speaker systems share some techniques, the initial part of this thesis is an overview of the common approaches to the recognition problem. We first analyze state-of-the-art techniques to pre-process the speech signal, to extract its relevant features and to represent them by means of statistical models. We then focus on the working principles of neural networks, and on several different methods for their training and regularization. Within the context of language recognition, we propose a neural network architecture to classify i-vectors, which are modelled on the basis of the recently presented Stacked Bottleneck Neural Network (SBN) features. Comparing this solution to a Gaussian Linear classifier, we show that the former performs lightly better than the latter. For speaker recognition, we focus on the pairwise approach, which consists in establishing whether a pair of i-vectors belongs to the same-speaker or to the different-speaker class. In particular, we present a siamese neural network architecture, which performs the binary classification of a pair of i-vectors. We propose different techniques to share its layer weights. The obtained architecture improves the scores of a previously proposed siamese network, but it does not provide better performance with respect to systems which implement Probabilistic Linear Discriminant Analysis (PLDA) or Pairwise Support Vector Machines (PSVM) techniques.

Relatori: Pietro Laface, Sandro Cumani
Anno accademico: 2018/19
Tipo di pubblicazione: Elettronica
Numero di pagine: 106
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Del Cinema E Dei Mezzi Di Comunicazione
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/18691
Modifica (riservato agli operatori) Modifica (riservato agli operatori)