Emanuele Mottola
Design of a document retrieval system using Transformer-based models and a domain specific ontology.
Rel. Antonio Vetro', Juan Carlos De Martin, Giuseppe Futia. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (1MB) | Preview |
Abstract: |
The scientific literature and internal research documents every institution produces is a key source of information for the members of the institution itself. To access this material effectively and to retrieve the information needed going beyond the keyword-based approach, a Transformer-based language model tailored on the semiconductor supply chain domain is employed together with the same domain ontology -- the Digital Reference [1] -- to build a document retrieval system over the pool of documents of the Infineon Corporate Supply Chain Innovation department. The further pre-training of the Bidirectional Encoder Representations from Transformers (BERT) model [2] on a text corpus based on the semiconductor supply chain literature is used to empower SentenceBERT [3] for sentence embeddings creation. Measuring the similarity score between the embedding representation of the query and the sentence embeddings related to the documents, the system is able to retrieve relevant documents to the query posed by the user. With the same mechanism, the classes of the Digital Reference are annotated, resulting in an ontology populated with documents that are shown to the user according to the match between query keywords and class names. The first results of the system are presented, where the F-measure reaches 0.58 and the mean Average Precision 0.45. |
---|---|
Relators: | Antonio Vetro', Juan Carlos De Martin, Giuseppe Futia |
Academic year: | 2020/21 |
Publication type: | Electronic |
Number of Pages: | 85 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Ente in cotutela: | KARLSRUHE INSTITUTE OF TECHNOLOGY (GERMANIA) |
Aziende collaboratrici: | Infineon Technologies AG |
URI: | http://webthesis.biblio.polito.it/id/eprint/16055 |
Modify record (reserved for operators) |