Encoder-Only Multi-Task Learning

Arda Eren Dogru

Encoder-Only Multi-Task Learning.

Rel. Carlo Masone, Fabio Cermelli. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025

Abstract

Multi-task learning (MTL) promises to enhance the efficiency of computer vision systems by solving multiple tasks, such as dense prediction and recognition, within a single model. However, this paradigm is often hindered by negative transfer, where conflicting task objectives degrade performance a critical issue for real-time applications in fields like autonomous driving and robotics. This thesis challenges the prevailing assumption that mitigating negative transfer requires complex, task-specific decoders. We introduce a minimalist, encoder-only framework grounded in deep architectural unification: heterogeneous tasks share a single ViT backbone and a uniform query-to-mask spatial projection, with polymorphic lightweight heads producing task-specific outputs. Our approach synthesizes two powerful frameworks for multi-task learning.

We adopt the query-based encoder only prediction approach from EoMT, which operates within the final layers of a DINOv2 (ViT) backbone