Shayan Taghinezhad Roudbaraki
Benchmarking Synonym Extraction Methods in Domain-Specific Contexts.
Rel. Luca Cagliero, Luca Gioacchini, Irene Benedetto. Politecnico di Torino, Master of science program in Computer Engineering, 2025
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (3MB) | Preview |
| Abstract: |
Accurate identification of synonyms is crucial for several Natural Language Processing tasks and it presents significant challenges when done in a specialized domain. These problems arise because of unique vocabularies, domain jargon, semantic shift of words when used in non-general domains and limited domain-specific resources for synonym detection. This thesis analyzes different methods for synonym extraction in domain-specific contexts by evaluating a subset of techniques on a multi-domain dataset which includes terms, their usage contexts and ground truth synsets in different domains such as agriculture, automotive, economy, geography, legal, medical and technology. The analysis include synonym extraction using traditional lexical resources like WordNet, various available forms of distributional semantic models like fastText, domain-specific corpus training and fine-tuning, and contextual embedding models like BERT. Clustering algorithms are also investigated when applied to combined term and definition representations. For a more thorough analysis, Name Entity Recognition for term identification is explored and compared with information extraction models and LLMs for the same task. Additionally, capabilities of large language models (LLMs) for definition generation and synonym grouping is explored. Evaluation of experiments is done by using standard Precision, Recall, F1-score metrics specifically adapted for synset recovery and recall for term identification. The research concludes that currently the proposed multi-step approach is most effective in synset creation which consists of: term identification and definition generation by an LLM, unsupervised clustering, and additionally refining the clusters by an LLM. |
|---|---|
| Relators: | Luca Cagliero, Luca Gioacchini, Irene Benedetto |
| Academic year: | 2024/25 |
| Publication type: | Electronic |
| Number of Pages: | 78 |
| Subjects: | |
| Corso di laurea: | Master of science program in Computer Engineering |
| Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
| Aziende collaboratrici: | MAIZE S.R.L. |
| URI: | http://webthesis.biblio.polito.it/id/eprint/36445 |
![]() |
Modify record (reserved for operators) |



Licenza Creative Commons - Attribuzione 3.0 Italia