polito.it
Politecnico di Torino (logo)

Development of a graph reduction method for minimizing RNA secondary structures and speeding-up sequence-structure alignment algorithms

Rossella Resta

Development of a graph reduction method for minimizing RNA secondary structures and speeding-up sequence-structure alignment algorithms.

Rel. Elisa Ficarra. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Biomedica, 2018

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview
Abstract:

RNA is one of the most important molecules along with DNA and proteins for the regulation of cells’ life. Several types of RNA molecules participate in the process of gene expression and for their role is fundamental the structure they assume. Differently from DNA, RNA can fold into intricate structures at secondary or tertiary level. A class of RNA of great importance is the non-coding RNAs. Knowing the structure of these molecules leads to understand their function and how they influence the expression of the genes. Nowadays it is not possible to exactly determinate in vivo the secondary structures of the molecules. For this reason, several methods for their prediction have been implemented in several tools. These methods apply different physical principles (e.g. minimum free energy) using various algorithmic approaches. The goal of this thesis is the development of a tool that takes as input the secondary structures predicted by several tools and compute the consensus between the different predictions by pruning less relevant and probable interactions between molecule's nucleotides.Cosmo, the main program developed for this thesis, is written in C++ and with Seqan code style with the addition of some python auxiliary blocks. It computes a consensus structure module for RNA secondary structures, merging the output structures of three tools of prediction, Ipknot with six configurations and RNA fold and RNAstructure that also integrate experimental data. The structure of the consensus consists in a graph whose vertex are the nucleotides and the edges represent the interactions between them. The weights of the edges, in the case of the consensus, are proportional to the number of tools that predict the interactions. Another type of structure is the one of the base pair probability matrix that consists of a graph where each vertex has edges connected with all the others and the weights are the probabilities of interaction. This type of structure is computed by RNAfold and actually is the input for the program Lara that compute the sequence structure alignment of RNA molecules. The objective of Cosmo is to give a lighter but not less accurate input to the sequence structure tools Lara and LocARNA and improve the computational time of the programs that for many sequences can take time of the order of hours. The program of Cosmo has been tested on two different sequence libraries, Rna Mapping Database(RMDB) that contains experimental data and Bralibase. The tool used to validate the improvement in the alignment is LocARNA.The obtained results show that computing the consensus structure of 50 sequences of RMDB and 476 sequences of Bralibase, the number of edges of consensus significantly decrease (98,3% for the first library and 96,9% for the second) in comparison with the base pair probability matrix. This results lead, in the second experiment, to an evident decrease in the computational time of the sequence structure alignment computed by LocARNA that speeds-up by 69,4%. Moreover, the quality of further sequence structure alignments has been evaluated demonstrating that after Cosmo's aggregation and pruning the quality of the alignment is preserved.Consequently, it is possible to assert that giving a lighter input for the sequence structure alignment, there is an evident speed-up in the computation of sequence structure alignment without impacting the quality of the results.

Relatori: Elisa Ficarra
Anno accademico: 2017/18
Tipo di pubblicazione: Elettronica
Numero di pagine: 73
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Biomedica
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-21 - INGEGNERIA BIOMEDICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/8010
Modifica (riservato agli operatori) Modifica (riservato agli operatori)