polito.it
Politecnico di Torino (logo)

Modeling the stochastic dynamics of protein evolution experiments using protein sequence landscapes

Leonardo Di Bari

Modeling the stochastic dynamics of protein evolution experiments using protein sequence landscapes.

Rel. Andrea Pagnani, Martin Weigt. Politecnico di Torino, Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi), 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview
Abstract:

Proteins are fundamental macro-molecules that are involved in a variety of vital functions in living organisms. They primarily consist of a linear amino-acid sequence, which allows the molecule to fold into a 3D structure and perform its function thanks to its chemical and physical properties. In this work, we are interested in understanding the sequence statistics and its evolution over different timescales. An interplay of mutations and selection shapes the amino-acid variety over the course of history. Understanding the stochastic dynamics of protein evolution is essential to the comprehension of the diversification of life and the emergence of new protein functions. Recently, the use of data-driven fitness landscapes and statistical physics methods to create a quantitative theory of protein evolution has gained more and more importance, leading to promising results. Our aim is to numerically simulate protein evolution and compare the results with different experimental and natural data features, such as Hamming distance, contact prediction, and several orders of correlation statistics. A Markov chain is used to describe the mutational dynamics, and a "sequence landscape" constructed from naturally occurring sequence variants is used to model the selection. The simulations start with a specific wild type protein and new proteins are designed using previously learned parameters to reproduce different stages of natural and in vitro evolution, which include experimental evolution at different rounds (10% of sequence mutations), and natural evolution observed in homologous sequences (70 − 80% of sequence variation). After having confirmed the validity of our generative model both locally and globally, we search for emergence of epistatic signals at intermediate scales trying to give an intuition of the important timescales that rule the dynamics. In conclusion, this project aims to provide relevant insights on the stochastic dynamics of protein evolution, which is essential to understanding the diversity of life.

Relatori: Andrea Pagnani, Martin Weigt
Anno accademico: 2022/23
Tipo di pubblicazione: Elettronica
Numero di pagine: 45
Soggetti:
Corso di laurea: Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-44 - MODELLISTICA MATEMATICO-FISICA PER L'INGEGNERIA
Aziende collaboratrici: Sorbonne Université
URI: http://webthesis.biblio.polito.it/id/eprint/27939
Modifica (riservato agli operatori) Modifica (riservato agli operatori)