polito.it
Politecnico di Torino (logo)

Performance of Deep Neural Networks for Sound Event Localization and Detection in Varying Noise Conditions

Emanuel Cascione

Performance of Deep Neural Networks for Sound Event Localization and Detection in Varying Noise Conditions.

Rel. Luciano Lavagno, Mihai Teodor Lazarescu. Politecnico di Torino, Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica), 2023

Abstract:

This thesis investigates sound event localization and detection (SELD), which is an emerging research topic in audio signal processing and machine learning. SELD aims to identify and localize multiple overlapping sounds in an acoustic environment, which has many important applications in audio surveillance, robotic auditory systems, and human-machine interaction. The main challenges of SELD are reverberation, noise, and source variability. The goal of this thesis is to compare different machine learning algorithms and architectures that can solve SELD, focusing on deep neural networks (DNNs). The research questions are: how do different types of DNNs perform on SELD? What are the pros and cons of each type of DNN? How do DNNs perform under different noisy conditions? The methodology consists of training and testing DNN models on the same dataset, which contains recordings from first-order ambisonics microphones. Each audio track contains a maximum of 2 overlapping active sound sources at a time. The input features for models are mel spectrograms and intensity vectors, which capture both spectral and spatial information of the sound sources. Models are evaluated using metrics from the DCASE challenges: error rate, F-score, localization error, and localization recall. The results show that different types of DNNs have different strengths and weaknesses for SELD. The models can be classified into 3 categories: convolutional recurrent neural networks (CRNNs), which combine convolutional layers for feature extraction and recurrent layers for temporal modeling; temporal convolutional networks (TCNs), which use only convolutional layers with large receptive fields; event-independent networks (EINs), which use separate paths for sound event detection (SED) and direction of arrival (DOA) estimation and exchange information between them using cross-parameter connections. This technique is also known as soft-parameter sharing (SPS), which differs from hard-parameter sharing used by CRNNs and TCNs. No-parameter sharing (nPS) refers to network that uses different paths for tasks, but without cross-parameter connections. The main findings are: CRNNs perform poorly on both SED and DOA; TCNs perform a bit better than CRNNs; EINs perform well on SELD; Adding squeeze-excitation blocks to CRNNs gives an improvement on SED, at low cost in terms of parameters; Replacing LSTMs with TCNs in CRNNs adds a large number of parameters, but greatly accelerates the training and inference processes by 5-6 times on developing hardware; Replacing transformer attention with Conformer blocks in EINs reduces SELD performance; SPS improves SELD performance with respect to nPS; EINs are more robust to noise than other models. Based on these results, EINs can be recommended for SELD since they achieve the best performance among all models and can handle multiple instances of the same class simultaneously. This work provides a rigorous comparison of different DNN architectures for SELD using objective metrics and a common dataset. It also provides a benchmark for future research on SELD, as well as insights into how to design and optimize DNN models for this task. The main contribution of this thesis is to propose and evaluate the performance of different DNN models that can solve SELD task under different noise conditions. Direction for future work is to explore other types of input features, data augmentation techniques, different architectures and attention mechanisms for SELD and apply SELD to real-world applications.

Relatori: Luciano Lavagno, Mihai Teodor Lazarescu
Anno accademico: 2022/23
Tipo di pubblicazione: Elettronica
Numero di pagine: 73
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-25 - INGEGNERIA DELL'AUTOMAZIONE
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/27815
Modifica (riservato agli operatori) Modifica (riservato agli operatori)