Flavio Spuri
Optimizing Genome Representations for Cancer Type Classification.
Rel. Alfredo Benso. Politecnico di Torino, Master of science program in Data Science And Engineering, 2024
|
Preview |
PDF (Tesi_di_laurea)
- Thesis
Licence: Creative Commons Attribution Non-commercial No Derivatives. Download (3MB) | Preview |
Abstract
In recent years Large Language Models (LLMs) have been successfully adapted to the field of Genomics, as shown by models such as DNABERT, DNABERT-2, and Nucleotide Transformer. Despite this, their application in the challenging field of Cancer Genomics remains unexplored. This thesis examines whether cancer genome analysis can benefit from large, pre-trained Transformer-based models, specifically focusing on the newly introduced HyenaDNA architecture. In HyenaDNA, traditional Attention Layers are replaced by so-called Hyena Filters, which consist of recursions of an element-wise multiplicative gating and a long convolution, allowing for the processing of longer sequences while maintaining single-base resolution, and achieving a subquadratic computational cost, aligning well with the specific needs of Cancer Genomics.
This study begins by assessing HyenaDNA's capabilities to represent genomic data
Relators
Academic year
Publication type
Number of Pages
Course of studies
Classe di laurea
Ente in cotutela
URI
![]() |
Modify record (reserved for operators) |
