Politecnico di Torino (logo)

Generative statistical models for RNA sequences

Francesco Calvanese

Generative statistical models for RNA sequences.

Rel. Andrea Pagnani, Martin Weigt. Politecnico di Torino, Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi), 2021

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview

Generative statistical models for RNA sequences The design of artificial biomolecules with given biological functions has become one of the main interests of biotechnology and bio-engineering in recent years. One of the goals in this field is the to improve natural molecules. Tha aim is to design artificial molecules that have the functionality of natural ones while being more stable/efficient/resistant. The recent advances in sequencing technology have significantly speeded up and increased the amount of biological data available. Now it is finally possible to apply data-driven approaches to address this issue. Generative models are tools in Machine or Statistical Learning used to generate artificial molecules that mimics the statistical features of natural ones, in the hope to also reproduce their biological functionality. There are several examples in the literature where these tools have been already applied successfully to proteins. In this thesis we apply them to RNA either designing new model architectures or adapting already existing ones. The generative models treated are inspired from statistical physics. We use inverse statistical physics to build Potts models from which we sample artificial data. We test and compare several models having a special consideration for interpretability, since by analyzing the parameters of a good model we can deepen our understanding of the biophysics of RNA sequences. As compared to proteins, the study of RNA has the advantage that the information on RNA secondary structure is easily accessible and there are efficient and precise algorithms for its prediction. We used secondary structure tests on our artificial sequences as an indicator for correct biological functionality. Furthermore, we used the information obtained from our models to build/improve structure prediction algorithms. We conclude that, possibly after refinements based on experimental tests, that generative sequence models are good candidates for the design of artificial RNA sequences.

Relators: Andrea Pagnani, Martin Weigt
Academic year: 2020/21
Publication type: Electronic
Number of Pages: 52
Corso di laurea: Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi)
Classe di laurea: New organization > Master science > LM-44 - MATHEMATICAL MODELLING FOR ENGINEERING
Aziende collaboratrici: Sorbonne Universita
URI: http://webthesis.biblio.polito.it/id/eprint/19139
Modify record (reserved for operators) Modify record (reserved for operators)