Leonardo Ortoleva
A network algorithm for exact tests for Hardy-Weinberg equilibrium with X-chromosomal variants.
Rel. Mauro Gasparini, Jan Graffelman. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (2MB) | Preview |
|
Archive (ZIP) (Documenti_allegati)
- Altro
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (7kB) |
Abstract: |
In statistical genetics, the exact test is of fundamental importance to assess whether, in a population, genotypes are distributed according to the Hardy-Weinberg law. Genetic variables are usually filtered on the basis of exact test results to avoid genotyping errors, which can negatively affect subsequent analyses in genetic epidemiology, ecology, forensics among others. It is used to test for statistical independence of the different alleles within the same locus. Given this importance, it is useful to understand which are the current tools to calculate it and how efficient they are. For a bi-allelic context it is possible to use asymptotic tests, such as the chi-square test. Widening the point of view and taking into account more alleles the data are more scattered within the genotype matrix and the exact test is the most reliable. This involves using the probability of the genotype counts you are observing, and then extracting the final p-value from which you can deduce whether or not there is evidence for deviation from equilibrium. All this comes to a very high precision which is, however, at the expense of performance. A further step is to consider the X chromosome, where the different conformation of the two sexes causes considerable computational difficulties. In this master thesis, the network algorithm available for the autosomes has been extended for the X chromosome, achieving spectacular improvements in computation time with respect to state-of-the art enumeration algorithms, avoiding the repetition of calculations that can be stored and taking advantage of the recursive technique. We analysed complete chromosomes of the 1,000 Genomes Project in order to evaluate exact algorithms and results for autosomal and X chromosomal variants with two or more alleles. Its functioning has been verified first of all in terms of final results with the few tools currently available, with excellent outcomes. Subsequently a performance analysis has highlighted the great usefulness of the result achieved, reducing by several orders of magnitude the computation time to precise values. In some cases, an analysis lasting more than 6 hours reduced to a few seconds to obtain the same output values. This is an excellent step forward that delivers an interesting new tool enabling efficient exact Hardy-Weinberg testing for X chromosomal variants up to at least five alleles. |
---|---|
Relatori: | Mauro Gasparini, Jan Graffelman |
Anno accademico: | 2020/21 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 131 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Ente in cotutela: | UNIVERSITAD POLITECNICA DE CATALUNIA - FIB (SPAGNA) |
Aziende collaboratrici: | NON SPECIFICATO |
URI: | http://webthesis.biblio.polito.it/id/eprint/15984 |
Modifica (riservato agli operatori) |