To ask or to abstain, what is the best strategy? Finding the best trade-off between: Active Learning and Learning to Reject

Daniele Giannuzzi

To ask or to abstain, what is the best strategy? Finding the best trade-off between: Active Learning and Learning to Reject.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (864kB) | Preview

Abstract:	The problem of abstaining from making uncertain predictions has received rising interest in the last few years. However, even if introducing a reject option for a machine learning model in a supervised scenario has already been addressed in many works in literature, it seems to be a completely unexplored field for anomaly detection, where few or no labels are available and making a misclassification can be very expensive for a company. In this work, we introduced a novel technique for anomaly detectors to abstain from making uncertain predictions, introducing a reject option for both unsupervised and semi-supervised scenarios. The novel framework, being based on a dependent rejector making use of the model confidence, is exploitable without regard to the anomaly detector chosen. In unsupervised setting a natural threshold is used to reject samples. On the other hand, in semisupervised scenario the threshold is tuned using labels, minimizing the overall cost. The cosine distance is used to measure the model reward in using labels for Active Learning or Learning to Reject. Then, a trade-off is found in the usage of labels for one or the other strategy. We evaluated our approach on a benchmark of 9 datasets for anomaly detection. The results show significant performance in rejecting samples for which the misclassification cost could be high. The framework comprised of rejection outperforms the simple Active Learning without rejection both in unsupervised and semi-supervised setting.
Relatori:	Paolo Garza
Anno accademico:	2022/23
Tipo di pubblicazione:	Elettronica
Numero di pagine:	60
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Data Science And Engineering
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela:	KUL - KATHOLIEKE UNIVERSITEIT LEUVEN (BELGIO)
Aziende collaboratrici:	Ku Leuven
URI:	http://webthesis.biblio.polito.it/id/eprint/25576

Modifica (riservato agli operatori)