From Euclidean To Hyperbolic Vision-Language Spaces: A Study of Attribute–Object Compositionality

Meelad Dashti

From Euclidean To Hyperbolic Vision-Language Spaces: A Study of Attribute–Object Compositionality.

Rel. Tatiana Tommasi, Nicola Strisciuglio. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2026

PDF (Tesi_di_laurea) - Tesi
Accesso limitato a: Solo utenti staff fino al 27 Marzo 2027 (data di embargo).
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (6MB)

Abstract

Humans can recognize novel combinations of known concepts with little to no effort. Even something as unlikely as a blue apple would be immediately understood, not because we have seen one before, but because we can mentally separate “blue” from “apple” and recombine them. This compositional understanding allows us to generalize beyond direct experience, which makes us wonder, can vision models do the same? This thesis examines compositionality in vision-only and vision-language models across both spherical and hyperbolic embedding geometries. Prior work introduced Geodesically Decomposable Embeddings (GDE), a framework that projects embeddings onto a tangent space at their intrinsic mean and extracts separate attribute and object directions.

However, GDE was originally applied only to spherical vision-language models under a single inference method