Low-complexity neural networks for robust acoustic scene classification in wearable audio devices

Michele Panariello

Low-complexity neural networks for robust acoustic scene classification in wearable audio devices.

Rel. Antonio Servetti. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (10MB) | Preview

Abstract

This work concerns the design of a machine learning pipeline to perform acoustic scene classification (ASC) on a pair of headphones by means of a convolutional neural network (CNN). ASC is the task of recognizing a scenery (e.g. bus, park, office) from the sounds it produces (e.g. engine noise, birds chirping, typing sounds). In our setting, the goal is to make the headphones context-aware to enhance user experience. We capture audio from the microphone of the headphones and run the CNN on their hardware to perform classification in real time. A challenging aspect of the task is the lack of recordings coming from the microphone of the headphones, which forces us to resort to external data sources: this can be problematic since training on audio acquired from a different microphone than the one used in the final device may cause a data distribution shift and impact the classification performance (a phenomenon known as "device mismatch").

Moreover, because of the embedded environment, it is only possible to use a CNN of low complexity, which may be limiting in terms of modeling accuracy