Politecnico di Torino (logo)

Novel Neural Techniques for Gene Expression Analysis in Cancer Prognosis

Pietro Barbiero

Novel Neural Techniques for Gene Expression Analysis in Cancer Prognosis.

Rel. Elio Piccolo, Giansalvo Cirrincione, Alberto Paolo Tonda. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Matematica, 2019

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview

This manuscript summarises two years of analyses, experiments and developments in the machine learning field. During that period, authors have collaborated in devising novel ideas and applying them to real world problems. The main application setting is related to the analysis of patient derived xenografts (PDXs) of metastatic colorectal cancer (mCRC). PDXs are obtained by propagating surgically derived tumor specimens in immunocompromised mice. Through this procedure, cancer cells remain viable ex-vivo and retain the typical characteristics of different tumors from different patients. Hence, they can effectively recapitulate the intra- and inter-tumor heterogeneity that is found in real patients. During the last decade, the Candiolo Cancer Institute (Italy, IRCC) has been assembling the largest collection of PDXs from mCRC available worldwide in an academic environment. Such resource has been widely characterized at the molecular level and has been annotated for response to therapies, including cetuximab, an anti-EGFR antibody approved for clinical use. The mCRC PDX samples analyzed in this manuscript were kindly provided by IRCC the in form of microarray data, i.e. a large table containing the gene expression levels of tumor cells. Indeed, the medical objectives of the analyzes described in this work concern, on the one hand, the extraction of gene expression patterns useful for the instruction of therapies and further clinical experiments, and, on the other hand, the creation of models capable to correctly classify unlabeled data according to the cancer response to drugs. From a statistical and machine learning point of view, the main difficulty in dealing with such data is the so-called curse of dimensionality. Indeed, only few hundreds of PDXs (samples) were provided against tens of thousands of gene expressions (features). Preliminary analyses performed with state-of-the-art techniques perform poorly when dealing with this problem, reporting limited effectiveness and opaque models. Thus, the machine learning objectives were to improve the effectiveness of existing models and designing ad-hoc techniques to deal with high-dimensional data. Analyses have been performed through both supervised and unsupervised methods. In particular, two different and delimited directions of work were carried out according to the typology of methods. This choice was made at the beginning of this work in order to also sub-divide the medical objectives to pursue. Roughly speaking, in fact, the extraction of gene expression patterns is commonly achieved through unsupervised techniques, while the creation of a classifier model can be obtained only with supervised techniques. Nonetheless, meaningful insight into gene behaviours were provided also through supervised learning.

Relators: Elio Piccolo, Giansalvo Cirrincione, Alberto Paolo Tonda
Academic year: 2018/19
Publication type: Electronic
Number of Pages: 121
Corso di laurea: Corso di laurea magistrale in Ingegneria Matematica
Classe di laurea: New organization > Master science > LM-44 - MATHEMATICAL MODELLING FOR ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/11186
Modify record (reserved for operators) Modify record (reserved for operators)