Salvatore Latino
Measuring Topic-Specific Semantic Information in Product Labels: An Embedding-Based Approach.
Rel. Luca Cagliero, Vito De Feo. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
|
|
PDF (Tesi_di_laurea)
- Tesi
Accesso limitato a: Solo utenti staff fino al 24 Aprile 2027 (data di embargo). Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (20MB) |
Abstract
Measuring how much semantic information a text conveys remains an open challenge: classical information theory quantifies uncertainty reduction but is agnostic to meaning. This thesis proposes a practical approach to quantify topic-specific semantic information in short texts. We target the domain of product labels and focus on environmental information, a context of high societal and regulatory relevance (e.g., Agenda 2030 and the forthcoming Digital Product Passport). This domain was selected as product labels usually contain concise and well-defined statements, often limited to a single claim, which minimizes ambiguity and makes them especially suitable for the quantitative analysis of semantic information. Our key idea is to estimate information coverage with respect to reference sentences, crafted in accordance with the Green Claims Directive issued by the European Union, that are assumed to be maximally informative.
Candidate sentences are embedded with sentence-transformer models into a shared embedding space, where clustering is applied and distance metrics are used to assign informativeness scores to individual sentences, which are then aggregated to obtain a label-level score
Tipo di pubblicazione
URI
![]() |
Modifica (riservato agli operatori) |
