Politecnico di Torino (logo)

Towards statistical-physics inspired modeling of experimental protein evolution

Matteo Bisardi

Towards statistical-physics inspired modeling of experimental protein evolution.

Rel. Alfredo Braunstein, Martin Weigt, Francesco Zamponi. Politecnico di Torino, Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi), 2020

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview

Data-driven modelling approaches, including those inspired by statistical physics of complex and disordered systems, are rapidly gaining importance in modern computational biology. In this report we propose approaches to the modelling of experimental evolution protocols like directed evolution, which proceed by alternating cycles of mutation (by error-prone polymerase chain reaction) and selection (e.g. for antibiotic resistance) for some protein of interest. Recently it has been shown in two independent articles that sequence ensembles generated by this approach can be used to gain important structural and functional information about the studied proteins. However, the basic understanding of the potential and the limitations of the experimental approaches remains limited, and the reasons leading to significant differences between the papers remain unclear. Here we address this question from a statistical-physics inspired point of view. We explore data-driven sequences landscapes inferred using a method from inverse statistical physics called Direct-Coupling Analysis (DCA), which infers Potts models from multiple-sequence alignments via a maximum-entropy approach. We first check that the sequence data coming from the two evolution experiments are well described by the DCA sequence landscape, and that they are coherent with sampling of sequence space using Gibbs sampling. Exploiting this connection, we can simulate in silico evolution protocols, which allow us to assess systematically important characteristics like the number of mutations per sequence, the strength of selection and the number of analysed sequences on their use for protein structure prediction. We show that this analysis can explain the different performance of the two recently published experimental works. It also opens the way to provide a priori to the experimentalists the parameters, which should be used in the experiments. We also anticipate that further precision of our modelling approaches can be obtained by a more realistic description of the evolutionary dynamics, e.g. by modelling mutations at the level of the genetic DNA sequence instead of the protein’s amino-acid sequence as done in this report, and thereby allow for an intense exchange between models and experiments.

Relators: Alfredo Braunstein, Martin Weigt, Francesco Zamponi
Academic year: 2019/20
Publication type: Electronic
Number of Pages: 47
Corso di laurea: Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi)
Classe di laurea: New organization > Master science > LM-44 - MATHEMATICAL MODELLING FOR ENGINEERING
Ente in cotutela: Universite de Paris 7- Denis Diderot (FRANCIA)
Aziende collaboratrici: Sorbonne Universita
URI: http://webthesis.biblio.polito.it/id/eprint/15307
Modify record (reserved for operators) Modify record (reserved for operators)