Maximum entropy modelling for inference in biological sequences analysis

Matteo De Leonardis

Maximum entropy modelling for inference in biological sequences analysis.

Rel. Andrea Pagnani. Politecnico di Torino, Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi), 2021

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (2MB) | Preview

Abstract

Likelihood maximization and entropy maximization are two common techniques used to infer the set of parameters of a probability distribution. In recent years, they have shown outstanding performance in inference problems of structural biology from sequence data. My work addresses two main aspects related to this subject. The first one is the prediction of contacts in a protein family through the analysis of correlation between residues. Standard information theory related methods based on local correlation measures (e.g. Mutual Information) that are routinely used to evaluate the correlation between two random variables, often fail because they are not able to disentangle direct from indirect interaction between variables.

For this purpose, global inference strategies such as entropy maximization, can be used to define a quantity called "direct information" which is capable to ignore statistical correlation between residues which are not linked to the presence of contacts between them

Relatori

Andrea Pagnani

Anno Accademico

2020/21

Tipo di pubblicazione

Elettronica

Numero di pagine

Corso di laurea

Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi)

Classe di laurea

Nuovo ordinamento > Laurea magistrale > LM-44 - MODELLISTICA MATEMATICO-FISICA PER L'INGEGNERIA

URI

https://webthesis.biblio.polito.it/id/eprint/17915

Modifica (riservato agli operatori)