Politecnico di Torino (logo)

Voice analysis: from speaker identification to speaker verification using Siamese Neural Network

Antonio Falabella

Voice analysis: from speaker identification to speaker verification using Siamese Neural Network.

Rel. Silvia Anna Chiusano. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2021

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview

Nowadays one of the most tedious tasks of our online life is the verification of our identity through several passwords, passphrases, PINs and cards. This work wants to analyse the possibility of using recorded voice to automatically identify someone. Identifying a person through his voice is an important human ability that, most of the time, is taken for granted in face-to-face interactions, but when the visual verification fails, like in telephone calls, it becomes crucial to correctly identify and verify who is speaking. This thesis is about the identification and verification of a person’s identity through his voice imprint using at most two raw audio inputs in the test phase. A critical point is to understand how an audio signal works and how it can be processed and, to do so, analysis on the major techniques of the past years will be presented and discussed. The first step of this work is to identify a person’s identity among a small group of people using a Deep Learning approach. In this phase, the Neural Network will learn to create an internal representation of each person’s almost-unique voice imprint belonging to the before-mentioned group. Then a bigger group of voices taken from the popular LibriSpeech dataset will be considered to evaluate the capacity of the network to generalise the problem. Several architectures, like Convolutional Neural Network, Residual Neural Network and SincNet among the others, will be observed and their result on different datasets will be discussed. The second objective is a more general task: the focus is shifted to the verification of the identity of a person among a virtually illimited number of people. Here the Neural Network is pushed to learn to distinguish audios made by the same person from audios made by different persons. In this step, both a naive approach based on the architecture of the first step and a more sophisticated Siamese Neural Network will be proposed and several architectures will be presented.

Relators: Silvia Anna Chiusano
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 59
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: DATA Reply S.r.l. con Unico Socio
URI: http://webthesis.biblio.polito.it/id/eprint/21992
Modify record (reserved for operators) Modify record (reserved for operators)