Mario Capobianco
Automatic detection of fitness shifts in pathogen phylogenies using contrastive learning.
Rel. Roberta Bardini, Stefano Di Carlo, Alessandro Savino, Alexander Zarebski, Gabriele Marino. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (14MB) | Preview |
| Abstract: |
The continual emergence and extinction of pathogen lineages, driven by factors including immune escape, environmental changes, or differences in transmissibility, poses major challenges for public health. Detecting lineages with an increased fitness is critical for understanding epidemiological shifts and guiding targeted interventions. Existing approaches for detecting fitness changes among lineages, such as PhyloWave, extract summary statistics from trees and use them to identify lineages with different evolutionary dynamics. However, PhyloWave depends on substantial domain knowledge and thresholds that require manual fine-tuning—often relying on expert judgment or arbitrary choices—which limits its scalability and robustness across different pathogens. To overcome the limitations of existing approaches, we integrate contrastive representation learning with phylogenetic modeling to implement a generalization of the multi-type birthdeath (MTBD) model in which mutation events alter lineage-specific transmission rates. These simulations generate training and testing data that capture a wide range of fitness scenarios. Building on this foundation, we design a supervised learning strategy for community detection in phylogenies, where recursive neural networks learn clade representations. By optimizing a contrastive loss, our model is encouraged to learn separate representations of lineages undergoing distinct fitness dynamics. Variant representations are input to a classification head, where similarity between each tree node and its parent acts as a continuous metric of relatedness. Higher similarity indicates a greater likelihood of shared evolutionary properties, enabling identification of variants with fitness changes along the tree. This approach eliminates dependence on ad hoc parameterization and establishes a principled and scalable methodological framework for monitoring the evolutionary fitness of circulating lineages, with potential applications across viral and bacterial pathogens. |
|---|---|
| Relatori: | Roberta Bardini, Stefano Di Carlo, Alessandro Savino, Alexander Zarebski, Gabriele Marino |
| Anno accademico: | 2025/26 |
| Tipo di pubblicazione: | Elettronica |
| Numero di pagine: | 90 |
| Soggetti: | |
| Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
| Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
| Aziende collaboratrici: | Politecnico di Torino |
| URI: | http://webthesis.biblio.polito.it/id/eprint/38763 |
![]() |
Modifica (riservato agli operatori) |



Licenza Creative Commons - Attribuzione 3.0 Italia