polito.it
Politecnico di Torino (logo)

Leveraging the Visual Capabilities of Transformers in Multimodal Machine Translation

Luca Villani

Leveraging the Visual Capabilities of Transformers in Multimodal Machine Translation.

Rel. Luca Cagliero, Lorenzo Vaiani. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (10MB) | Preview
Abstract:

Machine translation (MT) has come a long way since Deep Neural Networks (DNNs) arrived. The introduction of Transformer architecture, with its flexible data handling, opened the door to a new field: Multimodal Machine Translation (MMT). MMT aims to combine text with other information, like images, to improve translation accuracy. While MMT is a rapidly growing field, there are still challenges. One is the lack of data that combines different modalities with translations. Another is how to represent different data types effectively and then combine them in a way that captures the overall meaning. This thesis proposes a new architecture using three transformers: one for text, one for a general image representation, and one for detecting objects in the image. The goal is to see if using both general and specific image features improves translation quality. Additionally, this research focuses on a lightweight architecture compared to the current trend of using increasingly complex models. Experiments were conducted using two Transformer sizes ("Tiny" and "Small") for translating English to German, French, and Czech. The results show that the proposed approach works well for German and French, but for Czech, only the general image representation led to improvements so far.

Relatori: Luca Cagliero, Lorenzo Vaiani
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 84
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/31833
Modifica (riservato agli operatori) Modifica (riservato agli operatori)