polito.it
Politecnico di Torino (logo)

Design of a CNN-based method for classifying subtype of kidney cancers using miRNA isoform profiles

Marco De Franchis

Design of a CNN-based method for classifying subtype of kidney cancers using miRNA isoform profiles.

Rel. Gianvito Urgese, Elisa Ficarra, Marta Lovino. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Biomedica, 2020

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview
Abstract:

The aim of this thesis is to exploit microRNA isoforms expression profiles and Artificial Intelligence (AI) tools to classify samples from different cancer studies. MicroRNA (miRNA) are small non-coding RNA molecules of 19-22 nucleotides that regulate gene expression via base-pairing with complementary sequences within mRNA molecules. Each miRNA sequence can occur with some modifications that may influence the final behavior of the molecule, this sequence is called isoform. Thanks to the evolution of sequencing technologies, an increasing number of miRNA expression data were released. The Cancer Genome Atlas (TCGA) is one of the projects that collect these kinds of data. Studies carried out on tumor and healthy samples showed differential expression of miRNA between the two categories, in particular for those miRNA families related to oncogenic or tumor suppressors gene pathways. The growing availability of such data together with the current AI tools allows us to design more powerful classification tools for tumor identification. From this point, I decided to use miRNA isoform expression profiles as the input of Convolutional Neural Networks to predict malignancy in biological samples. With this aim I selected those cancer studies on TCGA with the highest amount of normal samples with respect to the tumoral available, that is : Kidney renal papillary cell carcinoma (KIRP), Kidney Renal Clear Cell Carcinoma (KIRC) and Kidney Chromophobe (KICH). The samples' numerosity varies among the subtypes and an imbalance between tumor and healthy samples up to a magnitude order is also present. To obtain their miRNA isoform expression profiles I considered separately two alignment tools from which I created two datasets: 1. Starting from the original TCGA alignment tool I created a table for each sample reporting its identified miRNAs in the rows and the expressions of 4 detectable isoforms in columns. 2. From the alignment tool isormiR-SEA, which identifies a greater number of isoforms I also created a table for each sample with miRNAs in rows and up to 10 detectable isoforms in columns. Finally, for each table in the two datasets, a column reporting the total expression for each miRNA was added. In the second part, I developed a system that, taken as input these two datasets, try to classify samples from the same tissue into one of the four classes, namely the three cancer types and the healthy samples. The system compares the two datasets (which represent a different level of miRNA expression) and measures their effectiveness in classification tasks. I divided the samples of the three cancer studies in a training set, to train the classifier, and a test set to compute the performances together with cross-validation. Different configurations of the input data (isoforms and miRNAs) and classifiers (multiclass and binary, tumor subtype vs. tumor subtype and normal vs. tumor subtype) were tested. Binary classifiers reported better results (up to 95% test accuracy) compared to multiclass (up to 65% test accuracy). For this reason, I decided to combine different binary classifiers to obtain a tree classifier to separate the 4 classes. This technique leads to better results compared to the simple multiclass (up to 72% test accuracy). A significant improvement derived considering the normal samples distinct in its original cancer study. This lead to a total of 6 classes that reported a test accuracy up to 84% with a tree classifier structure.

Relatori: Gianvito Urgese, Elisa Ficarra, Marta Lovino
Anno accademico: 2019/20
Tipo di pubblicazione: Elettronica
Numero di pagine: 77
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Biomedica
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-21 - INGEGNERIA BIOMEDICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/13798
Modifica (riservato agli operatori) Modifica (riservato agli operatori)