polito.it
Politecnico di Torino (logo)

scVEMO: Leveraging Single-cell Multiomics Data for Developmental Trajectory Reconstruction in the Embryonic Mouse Brain.

Alessia Leclercq

scVEMO: Leveraging Single-cell Multiomics Data for Developmental Trajectory Reconstruction in the Embryonic Mouse Brain.

Rel. Stefano Di Carlo, Alessandro Savino, Lorenzo Martini, Roberta Bardini. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (17MB) | Preview
Abstract:

Single-cell sequencing has revolutionized the study of gene expression and its phenotypic consequences by enabling the simultaneous profiling of thousands of individual cells. Recent advancements in multimodal single-cell sequencing have further expanded the scope of these techniques, allowing for the integration of transcriptomics, epigenomics, proteomics, and other omic data to obtain a more comprehensive view of cellular states and dynamics. A specific application of multiomics single-cell sequencing is lineage tracing, which provides insights into the developmental process from pluripotent cell populations to fully differentiated states. This thesis proposes scVEMO, a multiomics-based approach to lineage tracing leveraging CellRank and the RNA-velocity estimation techniques, scVELO, and its extension to changes in chromatin states, Multivelo. ScVEMO is validated on the Fresh Embryonic E18 Mouse Brain dataset provided by 10X Genomics. Building on the assumption that lineage commitment is a continuous process, where cells traverse a spectrum of intermediate states, scVEMO builds a K-Nearest Neighbor graph, connecting neighboring cells based on their joint scRNA-seq and scATAC-seq data profiles. Then it integrates gene expression, promoter peak counts, and RNA-velocity information to direct the graph and compute cell state transition probabilities. Finally, the CellRank framework is employed to simulate the system and identify terminal states. Particularly, CellRank uses the Generalized Perron Cluster Cluster Analysis (GPCCA) to coarse-grain the transition probability matrix into a set of macrostates, representing coarse-grained, metastable cellular states or phenotypes. The results from the random walk simulations, coupled with the identified macrostates, enable to compare the models and gain insights into how effectively each one is able to reconstruct biologically meaningful developmental lineages. The assessment then extends to examining cell fate probabilities, which are evaluated based on multilineage potential and the average probability of each cell cluster towards the identified terminal states. This additional investigation sheds light on the models' ability to capture cell fate commitment across various cellular populations. While the random walk simulations do not identify cell development at the granular cell-state level, results clearly demonstrate that integrating the epigenomic profiles via scVEMO improves macrostate identification on the UMAP embedding. ScVEMO distinguishes an additional terminal state within the neuronal lineage, corresponding to the upper cortical layers. This enhanced reconstruction represents a significant improvement over the transcriptomics-only method, which solely recovers the deeper cortical layers, allowing for a more congruent lineage reconstruction with the existing biology literature. Additionally, multilineage potential investigations, assessed through the KL-divergence between the single cell fate probabilities and the average fate probability per lineage across all cells, show that scVEMO improves cell lineage commitment compared to the scRNA-seq approach. Together, the results demonstrate that multimodal data integration can yield a robust and more informative lineage reconstruction compared to transcriptomics-only methods.

Relatori: Stefano Di Carlo, Alessandro Savino, Lorenzo Martini, Roberta Bardini
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 130
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/31930
Modifica (riservato agli operatori) Modifica (riservato agli operatori)