polito.it
Politecnico di Torino (logo)

Automatic Classification of Software Issue Report

Andrea Simone Foderaro

Automatic Classification of Software Issue Report.

Rel. Luca Ardito, Maurizio Morisio. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (20MB) | Preview
Abstract:

Context In the field of Software issue report, the process involving the correct classification of potential new bugs is often time consuming and expensive. One of the main causes of this phenomenon is bound to the fact that this process is human based, and as consequence this usually leads to frequent mis-classifications. Since errors lead to waste of time and resources they have to be efficiently reduced to the minimum. Moreover, reducing the time for bug triaging would reduce the cost of the entire bug removal process. Goal The aim of this thesis is building a tool that will automatically classify the bug and assign it to the correct class. The tool developed, upon the provision of a suite of information about the bug, is designed to be able to assign it to the right developers, ie. label the bug correctly. In more details, the model assigns the bug to the most probable class. Method The tool works on Mozilla bugs taken from the Mozilla bug database and it focuses on classifying a new bug to the corresponding class. The bugs present in the Bugzilla database are each characterised by a bug report, which contains all the informations on the bug from the time it was issued to the time it was removed. The model is based on the BugBug algorithm, with some modifications to best fit the problem at hand. The algorithm creates either a OneVsRest or a Binary classifier using a dataset of labeled bugs. The configuration of the algorithm is chosen using the command line arguments passed to it. Results The project was tested on two different scenarios: one single class, considered as the positive class, against all the others, labeled as negative class, and a multiple classes situation. The former case reached an accuracy higher than 60% for almost all labels, while the latter reached an accuracy lower than 30%, roughly 28%. Furthermore the two scenarios were extensively tested to find the best hyper-parameters for the dataset under consideration. Conclusions The tool proved to be successful to fulfil the goal it was developed for, but the accuracy was not optimal all the case. Indeed the accuracy obtained using the OneVsRest model never exceeded the 30% threshold, making it a model not fit for concrete application. On the other hand the binary model always showed an acceptable accuracy, even if it requires more time to classify a bug. Moreover both the classifiers can be easily extended to any generic bug coming from other sources different from Bugzilla, to enlarge its field of applicability. Few changes could be made with regards to the precision of the classifier.

Relatori: Luca Ardito, Maurizio Morisio
Anno accademico: 2020/21
Tipo di pubblicazione: Elettronica
Numero di pagine: 57
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/15888
Modifica (riservato agli operatori) Modifica (riservato agli operatori)