polito.it
Politecnico di Torino (logo)

A toolbox for the analysis and visualization of miRNA and isomiR expression levels

Ermes La Porta

A toolbox for the analysis and visualization of miRNA and isomiR expression levels.

Rel. Gianvito Urgese, Walter Gallego Gomez. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview
Abstract:

MicroRNAs (miRNAs) are a type of short non-coding RNA sequences involved in crucial biological processes. In fact, multiple studies show that they can be used as biomarkers for human diseases such as cancer and Parkinson's as well as phylogenetic markers for species evolution. Thanks to the huge amount of high-quality RNA reads that Next Generation Sequencing is able to produce, the analysis of miRNA has improved over the years, with computer science playing an increasingly important role in this bioinformatics field. miRNAs are characterized by a mechanism that mostly relies on a specific nucleotide region to bind to messenger RNA, so when performing miRNA alignment it is crucial to prioritize the conservation of this so-called seed sequence. Among a multitude of available software for this task, isomiR-SEA stands out as an optimized tool that pays attention to the specific miRNA characteristics. It performs a seed-based alignment, leading to higher accuracies than those of general purpose aligners. Moreover, it accurately identifies the multiple isoforms that a miRNA family can have (isomiRs). For greater efficiency, isomiR-SEA is often paired with BioSeqZip, a collapsing tool to preprocess the input data. The aim of this thesis is to build a modern toolbox for the analysis of miRNA (iSEA-TB) around a new unpublished version of isomiR-SEA, to prove its computational capabilities and its flexibility, while offering the user a reproducible analysis pipeline and an intuitive graphical user interface for results visualization and downstream analysis. The first component of iSEA-TB is a pipeline written in Nextflow, an open-source workflow management tool largely adopted in bioinformatics. The pipeline automatically runs all the steps required to perform a complete miRNA analysis: data download, quality check, trimming, collapsing, alignment, expression levels estimation, and results consolidation. To make the pipeline work, some modifications were coded into BioSeqZip and isomiR-SEA to allow them to correctly interface with the processed data. The second component of the iSEA-TB is a database-powered interactive analysis interface that allows users to interrogate the results obtained from the pipeline in the form of SQL queries, and build visualizations useful for downstream analysis, such as miRs and isomiRs expression levels, miR region conservation, and A2I substitutions. The interface was built using the open source software Grafana, which allows seamless interfacing with databases and offers multiple data visualization and navigation modes. The execution report by Nextflow was used to evaluate the pipeline performance. It includes runtime and memory usage that were compared against those of state-of-the-art tools. The pipeline is able to analyze considerable amounts of data in reasonable times, thanks to the modern C++ implementation of isomiR-SEA and BioSeqZip, proving that this combination of tools is ideal for large scale miRNA analysis. To demonstrate the usability of the GUI, two meaningful datasets were analyzed: raw RNA reads used by the MirGeneDB3.0 database, and a collection of human primary cell reads from the human microRNAome. The GUI produces interactive graphical representations that can dynamically show different sets of the obtained results, which in turn will facilitate the comparison between miRNA isoform expression levels of the requested selection. This flexible interface will certainly prove helpful for future pathological and philological studies.

Relatori: Gianvito Urgese, Walter Gallego Gomez
Anno accademico: 2025/26
Tipo di pubblicazione: Elettronica
Numero di pagine: 64
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/38649
Modifica (riservato agli operatori) Modifica (riservato agli operatori)