Daniele Amati
Automated preference data mining: Techniques for extracting insights from patient preference studies.
Rel. Filippo Molinari, Paola Berchialla. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Biomedica, 2024
PDF (Tesi_di_laurea)
- Tesi
Accesso riservato a: Solo utenti staff fino al 10 Giugno 2026 (data di embargo). Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (3MB) |
Abstract: |
The scientific community is witnessing unprecedented growth in medical research, resulting in an excessive flow of published studies. Therefore, the need to efficiently and effectively extract relevant information from these articles has become increasingly important. This thesis aimed to tackle this challenge by developing an interactive Shiny app for medical patient preference studies (PPS). PPS assesses patients' values, priorities, and choices regarding healthcare, treatments, or outcomes to better align medical decisions with their preferences and improve care. The objective was to create a user-friendly web application that enables researchers to search and analyze a vast database of scientific articles related to patient preferences in medicine. By providing an interactive platform to explore and analyze medical patient preference studies, researchers can more easily identify more easily patterns, trends, and insights that may have been overlooked using traditional methods. To achieve this goal, various natural language processing (NLP) techniques and classification algorithms were developed on an initial database and employed in a framework that automatically labels new articles upon insertion. Using this dataset several classifiers were built, leveraging both machine learning algorithms and deep learning architectures, and applying different text representation techniques: Bag of words (BOW), Term frequency-inverse document frequency (TF-IDF), Global Vectors for Word Representation (GloVe) and Bidirectional Encoder Representations from Transformers (BERT). The article labeling task was divided into separate classification problems, implementing the classifier that achieved the highest performance for each label. A balanced accuracy averaged among classes of 87.6 %, a sensitivity of 85.6 %, and a negative predicted value (NPV) of 85.1 % were achieved for the task of interventions labeling and a balanced accuracy of 94.5 %, a sensitivity of 93.7 %, and a NPV of 94.0 % were achieved on the task of therapeutic area labeling. The user interface was built using the Shiny framework for Python, which allows the creation of a front-end interface without expertise in web development. Using a modular approach, the interfaces were designed and implemented. For the back end, a microservice architecture was designed, with microservices communicating through REST APIs, illustrated using UML diagrams, and ultimately deployed in Docker containers. The database was built as a MySQL database, and a filter system based on bibliographic metadata and classifier labels was created to facilitate easy data consultation. The Shiny app displays a histogram of publication years and a map of the various publications. Users can update data in the correct format, which are then labeled by the classifier and added to the SQL database for further consultation. Finally, an artificial intelligence search assistant chatbot was built using META’s Llama 3 with Retrieval Augmented Generation (RAG) to automatically search for answers to users' questions across all study abstracts in the database. |
---|---|
Relatori: | Filippo Molinari, Paola Berchialla |
Anno accademico: | 2024/25 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 163 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Biomedica |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-21 - INGEGNERIA BIOMEDICA |
Aziende collaboratrici: | UNIVERSITA' DEGLI STUDI DI TORINO |
URI: | http://webthesis.biblio.polito.it/id/eprint/33660 |
Modifica (riservato agli operatori) |