Politecnico di Torino (logo)

Technical variability versus biological heterogeneity in single-cell RNA-sequencing data

Giulia Della Croce Di Dojola

Technical variability versus biological heterogeneity in single-cell RNA-sequencing data.

Rel. Enrico Bibbona, Gianluca Mastrantonio. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Matematica, 2021

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview

The study of gene expression provides many insights in several biological processes. Indeed, gene expression levels can be intended as "signatures" that characterize the tissues of an organism and allow to understand how they are governed at the molecular level. As a consequence, the monitoring of normal growth, as well as disease development, is made possible. Gene expression levels are directly linked to the abundance of mRNA fragments within a cell, that are therefore used as a quantification tool. One of the main techniques often employed to isolate and sequence the mRNA fragments is single-cell RNA-sequencing. This technique allows to perform cell-specific analyses, thus even addressing more complex tissues, made of different types of cells. Our analysis focuses on trying to estimate the biological variation given by heterogeneous gene expression levels, in a seemingly homogeneous population of cells. This is not straightforward, since single-cell RNA-sequencing is prone to a high level of technical noise, which needs to be modelled. In this perspective, we first consider a Bayesian approach proposed in the literature that aims at decomposing the observed variability into three components: the baseline Poisson variance, the variance inflation due to unexplained technical noise and the biological cell to cell variability. With the help of this methodology a set of highly variable genes (HVG) is identified, by investigating the ratio between biological and total variability. We have noticed that the proposed methodology has some identifiability issue that affects the estimate of the so-called capture efficiency parameter. We show that the posterior distribution of such parameter is completely determined by the prior and we propose a modified version that solves the problem. Conclusions on the correct classification of highly variable genes are then investigated.

Relators: Enrico Bibbona, Gianluca Mastrantonio
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 64
Corso di laurea: Corso di laurea magistrale in Ingegneria Matematica
Classe di laurea: New organization > Master science > LM-44 - MATHEMATICAL MODELLING FOR ENGINEERING
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/20782
Modify record (reserved for operators) Modify record (reserved for operators)