
Nastaran Ahmadi Bonakdar
Epigenetic Mechanisms in the Development of Neoplasms.
Rel. Alfredo Benso. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025
![]() |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (4MB) |
Abstract: |
This thesis explores the role of DNA methylation outliers in breast cancer within the framework of the epigenetic field defect hypothesis. The hypothesis posits that stochastic epigenetic alterations in histologically normal tissue may signal early carcinogenic processes. Using the GSE69914 dataset from the Gene Expression Omnibus, we analyzed methylation profiles across three tissue types: normal, cancer-adjacent normal, and cancerous breast tissue. A comprehensive preprocessing pipeline was implemented. Raw beta values were converted to Mvalues to reduce heteroscedasticity, followed by normalization, dimensionality reduction, and group labeling. Additional steps included variance-based filtering, Z-score transformations, and exclusion of low-quality or invariant CpG sites. Unlike earlier studies that rely solely on differential analysis, this work employed unsupervised machine learning algorithms for outlier detection, with the goal of identifying CpGs whose methylation values deviate substantially from the typical population-level distribution. Variance thresholds of 0.015 and 0.02 were tested to balance signal retention with computational feasibility. Five algorithms—K-Nearest Neighbors (KNN), Isolation Forest, Local Outlier Factor (LOF), One-Class SVM (OC-SVM), and Z-score analysis—were applied across multiple hyperparameters to detect anomalous CpG methylation patterns. Results showed that CpG sites such as cg19374752 and cg00000622 were consistently flagged as outliers across tissues and algorithms. The comparative analysis revealed that Z-score detection offered the highest recall and F1-score, whereas One-Class SVM delivered the highest precision, suggesting each method’s suitability for different diagnostic priorities. A secondary benchmark using the Thyroid Disease dataset validated the comparative performance of the algorithms on structured, labeled data. Pathway enrichment analysis of the most frequently outlying CpG-associated genes highlighted cancer-relevant biological processes, including DNA repair, Notch signaling, and estrogen response pathways. This work demonstrates the feasibility and diagnostic potential of methylation-based outlier detection in cancer and proposes a flexible, scalable pipeline for epigenetic biomarker discovery. It reinforces the value of integrating multiple detection models and adjusting preprocessing thresholds to uncover biologically meaningful patterns in complex, high-dimensional data. |
---|---|
Relatori: | Alfredo Benso |
Anno accademico: | 2024/25 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 45 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Aziende collaboratrici: | Politecnico di Torino |
URI: | http://webthesis.biblio.polito.it/id/eprint/36339 |
![]() |
Modifica (riservato agli operatori) |