Giovanni Peinetti
Integrating New Data into Generative Models of Biomolecular Sequences.
Rel. Andrea Pagnani, Martin Weigt. Politecnico di Torino, Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi), 2024
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (8MB) | Preview |
Abstract
The design of functional artificial biomolecules has been one of the main interests of biotechnology in recent years. The aim is to design sequences that have the same functionality of the natural ones and comparable features. Data-driven approaches are one of the more successful strategies. In Machine Learning, generative statistical models are tools to generate artificial biomolecular sequences. They are trained on Multiple Sequence Alignments of homologous families which consist of positive unlabelled sequences. In literature there are several examples where generative models have been built successfully to generate functional RNA and Proteins. Relying on maximum entropy principle, Direct Coupling Analysis (DCA) models are based on the Boltzmann Distribution in physics.
They are built by learning a Potts model from data via Maximum Likelihood and they can be used to sample artificial sequences
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
Ente in cotutela
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
