polito.it
Politecnico di Torino (logo)

From sequence to gene expression: a Deep Learning approach to evaluate miRNAs' effect

Elena Pianfetti

From sequence to gene expression: a Deep Learning approach to evaluate miRNAs' effect.

Rel. Elisa Ficarra, Marta Lovino. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Biomedica, 2021

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview
Abstract:

Proteins perform most of the cell's functions, and each protein must be present in a certain amount to make the cell work correctly. One way to control the number of proteins produced consists of controlling the number of mRNA molecules produced during transcription. The gene sequence includes what will be transcribed to RNA and the information that says how much of a gene product has to be produced. This part of the sequence is called a regulatory sequence. Understanding how this sequence works could lead to better predictions of mRNA expression. In 2020, Agarwal and Shendure developed the Xpresso model in which they predicted the amount of mRNA from the DNA sequence of the promoter region of a protein-coding gene. Xpresso model is based on a Convolutional Neural Network (CNN) architecture and benefits from additional features associated with the sequence (GC contents and lengths of 5' UTR, Open Reading Frame, and 3' UTR regions, exon junction density, and intron length). In addition, they used the prediction's residuals to understand the impact of transcriptional and post-transcriptional regulatory mechanisms like enhancers, microRNAs, and heterochromatic domains. Among the post-transcriptional mechanisms, there are microRNAs (miRNAs) which are short RNA molecules that can act on the mRNAs either by preventing them from being translated into proteins or by destroying them. However, the miRNAs' effect is not considered as input in the Xpresso model. Therefore, this thesis aims to add miRNA expression to the model to improve mRNA expression predictions. Specifically, this thesis deals with a brain cancer dataset called medulloblastoma. The samples belong to four medulloblastoma subclasses: group 3, group 4, sonic hedgehog (SHH), and WNT. miRNAs do not act on all genes at once, each miRNA has a specific set of target genes upon which they act, and they also have different expression values. Therefore, not every miRNA affects gene expression the same way. Some miRNAs have so little impact that the model does not understand their function, and as a result, they add noise to the model. I tested different methods to choose the best set of miRNAs to predict gene expression. Then, I compared the results in five conditions: 1.??without using the miRNAs; 2.??using all the miRNAs that have a target; 3.??using miRNAs whose cumulative weighted context++ score (CWCS) is correlated to the residuals of the prediction without miRNAs; 4.??using miRNAs whose CWCS score is correlated to the absolute value of the residuals of the prediction without miRNAs; 5.??using miRNAs known to be expressed explicitly in these subtypes of cancer. I evaluated the previously mentioned methods on each of the four subclasses independently and on all the samples together (the label is the mean gene expression value over all the samples, no matter the subclass). The model with the worst result was always the one in which all the miRNAs with targets were used. The best result was obtained using the correlation to the residuals in three cases (mean, group 3, and WNT). In the case of group 4, the best method was the one that used miRNAs correlated to the absolute value of the residuals, and for the SHH class, two methods shared the best result, the one that did not use miRNAs and the one that used miRNAs correlated to the absolute value of the residuals. In conclusion, miRNAs can be used to improve gene expression predictions, but the right miRNAs have to be chosen.

Relatori: Elisa Ficarra, Marta Lovino
Anno accademico: 2021/22
Tipo di pubblicazione: Elettronica
Numero di pagine: 71
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Biomedica
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-21 - INGEGNERIA BIOMEDICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/20182
Modifica (riservato agli operatori) Modifica (riservato agli operatori)