Shayan Taghinezhad Roudbaraki
Benchmarking Synonym Extraction Methods in Domain-Specific Contexts.
Rel. Luca Cagliero, Luca Gioacchini, Irene Benedetto. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (3MB) | Preview |
Abstract
Accurate identification of synonyms is crucial for several Natural Language Processing tasks and it presents significant challenges when done in a specialized domain. These problems arise because of unique vocabularies, domain jargon, semantic shift of words when used in non-general domains and limited domain-specific resources for synonym detection. This thesis analyzes different methods for synonym extraction in domain-specific contexts by evaluating a subset of techniques on a multi-domain dataset which includes terms, their usage contexts and ground truth synsets in different domains such as agriculture, automotive, economy, geography, legal, medical and technology. The analysis include synonym extraction using traditional lexical resources like WordNet, various available forms of distributional semantic models like fastText, domain-specific corpus training and fine-tuning, and contextual embedding models like BERT.
Clustering algorithms are also investigated when applied to combined term and definition representations
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
