Politecnico di Torino (logo)

Explaining black-box models in the context of audio classification

Giuseppe De Luca

Explaining black-box models in the context of audio classification.

Rel. Tania Cerquitelli, Francesco Ventura. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2021

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (6MB) | Preview

Over the years, Artificial Intelligence(AI) has increased its importance and impact in today's society and we have also grown accustomed to accepting his decisions. Several aspects of our everyday life are based on AI decisions and in some important fields AI decisions have a life-changing importance where failure is not acceptable. IA algorithms are very powerful in terms of prediction results. However those algorithms suffers from opacity, that it we have difficulties to understand the reasons behinds , in some cases , crucial decisions. Explainable Artificial Intelligence (XAI) propose to move to a more transparent and understandable IA. The goal is to build a suite of technologies that produce more explainable models without lower performances. In our work we propose A-EBAnO, an approach of a more general explanation framework on the context of audio classification. A-EBAnO provides local explanations of the prediction of input audio based on the analysis of Log-Mel Spectrogram, a representation of audio power changing over time at different frequencies. We analyze the decision-making process of a black-box CNN builded upon a slightly modified version of VGG. Produced explanations presents both a visual and numerical information. Explanations are provided mining inner knowledge of the convolutional layers of the CNN finding interpretable features of the input audio. Two indexes measure the influence and influence precision of the extracted features on the prediction, iteratively comparing the prediction probabilities on the original input and on several perturbed version of it in correspondence of the mined features. A set of useful explanations is produced and the most informative is chosen. Main contribution of our work is the complete adaptation of the framework on the new context, starting from an input pre-processing where Log-Mel spectrogram is computed, passing through the explanation-making process where we build a perturbation process suitable for the context, using Additive White Gaussian Noise directly on the spectrogram , studied and adapted the features extraction technique of the general framework and created two new features extraction technique to better analyze the input audio on time and frequency bands, concluding with a post-processing phase necessary to create easily understandable explanations. The proposed technique has been validated on a important number of input audio and examples of particular audio Log-Mel spectrogram classification explanation are provided both with a more general analysis of the technique results on the inputs.

Relators: Tania Cerquitelli, Francesco Ventura
Academic year: 2020/21
Publication type: Electronic
Number of Pages: 99
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/18117
Modify record (reserved for operators) Modify record (reserved for operators)