Luca Villani
Leveraging the Visual Capabilities of Transformers in Multimodal Machine Translation.
Rel. Luca Cagliero, Lorenzo Vaiani. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (10MB) | Preview |
Abstract: |
Machine translation (MT) has come a long way since Deep Neural Networks (DNNs) arrived. The introduction of Transformer architecture, with its flexible data handling, opened the door to a new field: Multimodal Machine Translation (MMT). MMT aims to combine text with other information, like images, to improve translation accuracy. While MMT is a rapidly growing field, there are still challenges. One is the lack of data that combines different modalities with translations. Another is how to represent different data types effectively and then combine them in a way that captures the overall meaning. This thesis proposes a new architecture using three transformers: one for text, one for a general image representation, and one for detecting objects in the image. The goal is to see if using both general and specific image features improves translation quality. Additionally, this research focuses on a lightweight architecture compared to the current trend of using increasingly complex models. Experiments were conducted using two Transformer sizes ("Tiny" and "Small") for translating English to German, French, and Czech. The results show that the proposed approach works well for German and French, but for Czech, only the general image representation led to improvements so far. |
---|---|
Relators: | Luca Cagliero, Lorenzo Vaiani |
Academic year: | 2023/24 |
Publication type: | Electronic |
Number of Pages: | 84 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Aziende collaboratrici: | UNSPECIFIED |
URI: | http://webthesis.biblio.polito.it/id/eprint/31833 |
Modify record (reserved for operators) |