Politecnico di Torino (logo)

DNAfusion: a nf-core pipeline for gene fusion detection on DNA samples

Francesca Miccolis

DNAfusion: a nf-core pipeline for gene fusion detection on DNA samples.

Rel. Santa Di Cataldo, Elisa Ficarra, Marta Lovino. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Biomedica, 2022

[img] PDF (Tesi_di_laurea) - Tesi
Restricted to: Repository staff only until 27 July 2025 (embargo date).
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (7MB)

Next Generation and Long-Read Sequencing technologies have allowed an increase in DNA sample availability. Despite the higher computational and economic costs to obtain DNA samples with respect to RNA samples, the contribution is relevant. A crucial point of chromosomal aberrations is represented by gene fusions, an aberration of the DNA sequence that can determine the generation of new chimeric transcripts or cause the deregulation of the genes involved. Many tools have been proposed for gene fusion detection on RNA samples. However, at the moment, there is no complete open-source pipeline for gene fusion detection on DNA samples. I, therefore, propose a pipeline for gene fusion detection on DNA samples developed with Nextflow as Workflow Management System. First, the pipeline takes in input fastq paired-end reads. Then, it processes the samples with proper tools and prioritizes the detected gene fusions with prioritization tools. This last step is required to evaluate each resulting gene fusion's oncogenic potential. The pipeline is composed of three consequential steps: the quality check of the input reads with FastQC, the detection (GeneFuse, GRIDSS) of gene fusions on the samples analyzed, and the prioritization (ChimeriDriver, DEEPrior, and Oncofuse) to obtain a gene fusion score. GRIDSS is a tool for detecting structural variants and DNA regions, including inversion, translocation, or genomic imbalances, like insertion or deletion. Then, a further processing step is performed with Linx, a tool for annotating and interpreting structural variants to derive gene fusions from GRIDSS output. I develop the pipeline compliant with the guidelines required by nf-core, a curated and citable community in continuous development that aims to standardize and share bioinformatic pipelines following the FAIR (Findability, Accessibility, Interoperability, Reuse) principles. The spirit of the nf-core group consists of collaborating and sharing knowledge and skills, breaking the barriers often observed between different research groups. This thesis aims to create a standardized, well-documented, stable, and reusable (regardless of the execution platform) pipeline for detecting gene fusion on DNA samples. Indeed this is an open and expanding field of research because it is widely documented that gene fusions are related to cancer mechanisms, so the analysis of the latter could be fundamental for diagnostic and therapeutic purposes. Furthermore, compliance with nf-core standards ensures user-friendly playback and developer support. Therefore my work can benefit anyone in the scientific community interested in detecting gene fusion on DNA samples, whose availability will continue to increase over time.

Relators: Santa Di Cataldo, Elisa Ficarra, Marta Lovino
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 116
Corso di laurea: Corso di laurea magistrale in Ingegneria Biomedica
Classe di laurea: New organization > Master science > LM-21 - BIOMEDICAL ENGINEERING
Aziende collaboratrici: Institut Curie Centre de Recherche
URI: http://webthesis.biblio.polito.it/id/eprint/23745
Modify record (reserved for operators) Modify record (reserved for operators)