polito.it
Politecnico di Torino (logo)

GUI Representation Learning for Downstream Real-World Applications

Francesca Russo

GUI Representation Learning for Downstream Real-World Applications.

Rel. Luigi De Russis, Tommaso Calo'. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (12MB) | Preview
Abstract:

Recent advancements in Artificial Intelligence (AI) have allowed the development of tools that can assist professionals across various industries in completing different tasks, such as realistic character design in gaming, news analysis and fact-checking in journalism. Among the industries benefiting from AI advancements is User Interface (UI) design, where AI-based tools are playing a key role in enhancing efficiency and creativity. Graphical User Interface (GUI) design is the process of designing the visual layout of a software application, focusing on the appearance, functionality and usability of the interface that users interact with. Nowadays, designers rely on several products, such as Figma, that facilitate the creation, prototyping and testing of GUIs. Figma, a real-time collaborative platform, has recently integrated novel AI features, with many more to come. Unfortunately, in many datasets, GUIs are represented in a verbose format that may not be properly structured for achieving optimal performance for AI models. Moreover, existing GUI datasets are not built for seamless integration of AI models within Figma's environment. To address these issues, a learned approach is proposed to extract a meaningful representation of GUI information, alongside the introduction of a new Figma-compatible hierarchical dataset. The objective is to facilitate both the development and the deployment of new AI models for downstream real-world applications in GUI design. Specifically, this work involves training a Vector Quantized Variational Autoencoder (VQ-VAE) to learn a codebook of latent quantized embeddings, updated via Exponential Moving Average (EMA), capturing the relationships between GUI elements. This learned vocabulary can subsequently be used to train models for downstream real-world applications, such as GUI components generation. To validate the approach, the VQ-VAE is first trained and tested for bounding box reconstruction and category classification only, using both the well-known Rico dataset and the newly proposed Figma Layout User Interface Dataset (FLUID), which contains JSON files that can be directly imported in Figma. Next, GUI data from FLUID, incorporating additional elements such as image embeddings, text embeddings, and background colors, are integrated into the VQ-VAE to evaluate its ability to encode these more complex features. The results demonstrate the VQ-VAE's capability to reconstruct and classify GUI items when trained on a limited set of features, particularly when using Rico. However, when more complex features are introduced using FLUID, the model exhibits reduced performance, highlighting the need for further optimizations in both its architecture and training procedure.

Relatori: Luigi De Russis, Tommaso Calo'
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 94
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/32941
Modifica (riservato agli operatori) Modifica (riservato agli operatori)