Shayan Taghinezhad Roudbaraki
Benchmarking Synonym Extraction Methods in Domain-Specific Contexts.
Rel. Luca Cagliero, Luca Gioacchini, Irene Benedetto. Politecnico di Torino, Master of science program in Computer Engineering, 2025
|
Preview |
PDF (Tesi_di_laurea)
- Thesis
Licence: Creative Commons Attribution Non-commercial No Derivatives. Download (3MB) | Preview |
Abstract
Accurate identification of synonyms is crucial for several Natural Language Processing tasks and it presents significant challenges when done in a specialized domain. These problems arise because of unique vocabularies, domain jargon, semantic shift of words when used in non-general domains and limited domain-specific resources for synonym detection. This thesis analyzes different methods for synonym extraction in domain-specific contexts by evaluating a subset of techniques on a multi-domain dataset which includes terms, their usage contexts and ground truth synsets in different domains such as agriculture, automotive, economy, geography, legal, medical and technology. The analysis include synonym extraction using traditional lexical resources like WordNet, various available forms of distributional semantic models like fastText, domain-specific corpus training and fine-tuning, and contextual embedding models like BERT.
Clustering algorithms are also investigated when applied to combined term and definition representations
Relators
Academic year
Publication type
Number of Pages
Course of studies
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modify record (reserved for operators) |
