Ahmad Sidani
Visual Context Meets Translation: A CycleGAN Approach to Multimodal Neural Machine Translation.
Rel. Luca Cagliero, Giuseppe Gallipoli. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025
|
|
PDF (Tesi_di_laurea)
- Tesi
Accesso limitato a: Solo utenti staff fino al 25 Luglio 2026 (data di embargo). Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (2MB) |
Abstract
Neural Machine Translation (NMT) has made remarkable progress with the Transformer architecture, but it remains challenged by ambiguous or context-dependent language that can benefit from visual context. Multimodal Machine Translation (MMT) addresses this by incorporating information from images into the translation process to improve accuracy, especially for image-related or ambiguous content. However, most MMT approaches rely on multimodal parallel corpora, which are scarce for many language pairs. This thesis introduces a CycleGAN-based multimodal translation architecture allowing training without direct sentence-pair annotation by utilizing images as a pivot from one language to another. The model synthesizes visual-semantic representations from CLIPTrans such that the source and target language representations are aligned in a common vision-language feature space.
A cycle-consistent learning task is used: the system produces a translation and translates the resulting string back into the source language to reconstruct the input sentence such that semantic consistency is enforced without access to ground-truth translations
Tipo di pubblicazione
URI
![]() |
Modifica (riservato agli operatori) |
