polito.it
Politecnico di Torino (logo)

To ask or to abstain, what is the best strategy? Finding the best trade-off between: Active Learning and Learning to Reject

Daniele Giannuzzi

To ask or to abstain, what is the best strategy? Finding the best trade-off between: Active Learning and Learning to Reject.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (864kB) | Preview
Abstract:

The problem of abstaining from making uncertain predictions has received rising interest in the last few years. However, even if introducing a reject option for a machine learning model in a supervised scenario has already been addressed in many works in literature, it seems to be a completely unexplored field for anomaly detection, where few or no labels are available and making a misclassification can be very expensive for a company. In this work, we introduced a novel technique for anomaly detectors to abstain from making uncertain predictions, introducing a reject option for both unsupervised and semi-supervised scenarios. The novel framework, being based on a dependent rejector making use of the model confidence, is exploitable without regard to the anomaly detector chosen. In unsupervised setting a natural threshold is used to reject samples. On the other hand, in semisupervised scenario the threshold is tuned using labels, minimizing the overall cost. The cosine distance is used to measure the model reward in using labels for Active Learning or Learning to Reject. Then, a trade-off is found in the usage of labels for one or the other strategy. We evaluated our approach on a benchmark of 9 datasets for anomaly detection. The results show significant performance in rejecting samples for which the misclassification cost could be high. The framework comprised of rejection outperforms the simple Active Learning without rejection both in unsupervised and semi-supervised setting.

Relators: Paolo Garza
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 60
Subjects:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Ente in cotutela: KUL - KATHOLIEKE UNIVERSITEIT LEUVEN (BELGIO)
Aziende collaboratrici: Ku Leuven
URI: http://webthesis.biblio.polito.it/id/eprint/25576
Modify record (reserved for operators) Modify record (reserved for operators)