polito.it
Politecnico di Torino (logo)

Drug-likeness Prediction and Fragment Extraction using Transformer-based Graph Neural Network on Traditional Chinese Medicine Molecules

Marco Colangelo

Drug-likeness Prediction and Fragment Extraction using Transformer-based Graph Neural Network on Traditional Chinese Medicine Molecules.

Rel. Stefano Di Carlo, Alessandro Savino, Roberta Bardini, Riccardo Smeriglio. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview
[img] Archive (ZIP) (Documenti_allegati) - Altro
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (431kB)
Abstract:

The use of Traditional Chinese Medicine spans thousands of years, yet its integration into modern pharmaceutical research has been limited. A major challenge is the lack of systematic evaluation of the chemical properties of TCM compounds, which slows their development into approved pharmaceuticals. Adopting drug-likeness as a metric, which refers to the physicochemical and structural properties of a molecule that make it potentially suitable for development as a pharmaceutical drug, is crucial for determining whether a compound could be a viable drug candidate. Given the diversity and complexity of TCM, manually evaluating each compound for drug-likeness is impractical. Therefore, an efficient, systematic approach is needed to assess the drug-likeness of TCM compounds and understand the chemical structures that contribute to their therapeutic potential. To address this challenge, this thesis proposes a data-driven approach using structured data and machine learning techniques to systematically evaluate the drug-likeness of TCM compounds, enabling the identification of promising candidates for pharmaceutical development. The strategy involves building a custom Transformer-based Graph Neural Network model to predict drug-likeness by analyzing molecular structures and identifying the most pharmacologically relevant chemical substructures within each compound. ZINC, a curated collection of commercially available chemical compounds specifically designed for virtual screening, is the dataset used for the training, validation and testing of the model. Only compounds from the "in vitro" and "in vivo" categories have been selected. The model achieves an accuracy of 83%, a precision of 80% and a recall of 88% on the test set. The ready-to-use model has then been applied to a dataset related to TCM. This enables the model to determine which compounds may be drug-like and offer insights into specific chemical fragments that contribute to drug-likeness, revealing patterns within TCM’s unique molecular compositions. We then categorize TCM compounds into distinct clusters based on structural similarities using the extracted relevant fragments. Each cluster is represented by a centroid molecule that encapsulates the key chemical features of that group. We finally conduct a comprehensive literature review to explore the pharmaceutical applications of these centroid molecules, validating the drug-likeness predictions by connecting them to known pharmacological data and uncovering therapeutic potential within TCM that aligns with modern drug development goals. Out of 149 clusters, 117 have confirmed archetypes or molecules that are closely related to these archetypes, which may be considered as tested or clinically used drugs. Through this innovative application, the thesis bridges ancient medicinal knowledge and novel computational techniques, opening new possibilities for sustainable drug discovery from natural resources. The extraction of fragments also highlighted the presence of repeated patterns, which could be further examined in future research. The clustering approach enabled us to identify representative compounds with promising drug-like properties, highlighting the potential of integrating TCM compounds into modern pharmaceutical development. This provides a solid foundation for future drug discovery efforts, integrating traditional remedies into modern medicine.

Relatori: Stefano Di Carlo, Alessandro Savino, Roberta Bardini, Riccardo Smeriglio
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 74
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela: Kyoto Institute of Technology (GIAPPONE)
Aziende collaboratrici: Kyoto Institute of Technology
URI: http://webthesis.biblio.polito.it/id/eprint/34057
Modifica (riservato agli operatori) Modifica (riservato agli operatori)