Daniele Giannuzzi
To ask or to abstain, what is the best strategy? Finding the best trade-off between: Active Learning and Learning to Reject.
Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (864kB) | Preview |
Abstract: |
The problem of abstaining from making uncertain predictions has received rising interest in the last few years. However, even if introducing a reject option for a machine learning model in a supervised scenario has already been addressed in many works in literature, it seems to be a completely unexplored field for anomaly detection, where few or no labels are available and making a misclassification can be very expensive for a company. In this work, we introduced a novel technique for anomaly detectors to abstain from making uncertain predictions, introducing a reject option for both unsupervised and semi-supervised scenarios. The novel framework, being based on a dependent rejector making use of the model confidence, is exploitable without regard to the anomaly detector chosen. In unsupervised setting a natural threshold is used to reject samples. On the other hand, in semisupervised scenario the threshold is tuned using labels, minimizing the overall cost. The cosine distance is used to measure the model reward in using labels for Active Learning or Learning to Reject. Then, a trade-off is found in the usage of labels for one or the other strategy. We evaluated our approach on a benchmark of 9 datasets for anomaly detection. The results show significant performance in rejecting samples for which the misclassification cost could be high. The framework comprised of rejection outperforms the simple Active Learning without rejection both in unsupervised and semi-supervised setting. |
---|---|
Relators: | Paolo Garza |
Academic year: | 2022/23 |
Publication type: | Electronic |
Number of Pages: | 60 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Ente in cotutela: | KUL - KATHOLIEKE UNIVERSITEIT LEUVEN (BELGIO) |
Aziende collaboratrici: | Ku Leuven |
URI: | http://webthesis.biblio.polito.it/id/eprint/25576 |
Modify record (reserved for operators) |