polito.it
Politecnico di Torino (logo)

Construction of an AI-based system for the detection of prostate cancer

Laura Lopera Tobon

Construction of an AI-based system for the detection of prostate cancer.

Rel. Gabriella Balestra, Samanta Rosati. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Biomedica, 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (7MB) | Preview
Abstract:

Prostate cancer is the second cancer with the highest incidence worldwide in men and is characterized by the excessive proliferation of cells in the prostate gland. Due to this incidence, this pathology's correct and timely identification has become important. Currently, the accurate diagnosis of prostate cancer is made through biopsy, however, performing a biopsy on each suspected person entails discomfort and can even trigger health problems in patients. For this reason, other methods that are based on clinical variables are seeking to be implemented, which could avoid unnecessary biopsies. In this thesis work, the main objective was to develop an Artificial intelligence-based algorithm that allows the correct prediction of prostatic cancer using MATLAB. In consequence, a dataset of 1621 patients with different types of variables was used: categorical, numerical, and binary. Initially, modifications were made to these data (e.g., completeness and correctness verification, merging of variables), and after carrying out an analysis of missing values, it was decided to implement imputation of variables using k-Nearest Neighbors (kNN) and see what the influence of the classifier would be depending on its performance. The classifier chosen for cancer prediction was kNN. In addition, the influence of two more important parameters in the classifier was analyzed: the distance and the number of k. The distances evaluated were the Hamming distance, the default of MATLAB when the input is tables and which compares string sequences, and the Gower distance, which calculates different distances depending on the variable type and for which a respective function was designed. For the number of k, the square root of the number of subjects was originally chosen, and subsequently 75 %, 50 %, and 25 % of its initial value. The performance of the classifiers was evaluated based on some descriptive parameters which were calculated after training the predictors with a training set and validating them through k-fold cross-validation (where 10 different groups were generated). In this study, it was found that the imputation of variables did not really have a great influence on the results of the dataset, which can be explained by the fact that almost all the fields were imputed with the same value, except for a small percentage that corresponded to less than 3 % of total patients. Additionally, appreciable differences were found when changing the distance: the one corresponding to Hamming managed to have more generalized results and with less classification error (balanced accuracy of closely 71 %). Finally, it could be seen that the results obtained were partially encouraging: although the classifiers did not have optimal performance, this research would help in the future for the development of better tools in the field of prostate cancer, especially when there is a dataset as heterogeneous as the one used in this project.

Relatori: Gabriella Balestra, Samanta Rosati
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 65
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Biomedica
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-21 - INGEGNERIA BIOMEDICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/28920
Modifica (riservato agli operatori) Modifica (riservato agli operatori)