From Images to Code: Leveraging Computer Vision and Large Language Models for Front-End Automation

Domenico Manuardi

From Images to Code: Leveraging Computer Vision and Large Language Models for Front-End Automation.

Rel. Luigi De Russis. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (5MB) | Preview

Abstract

Recent advances in Artificial Intelligence have catalyzed innovation across diverse domains, including software engineering. Traditionally, applications of modern Large Language Models (LLMs) have been centered on creating code-writing assistants that generate code from a given context or textual description of the desired outcome. This thesis presents a novel approach, focusing on the development of a coding assistant that harnesses the multi-modal capabilities of state-of-the-art LLMs to facilitate front-end development from a graphical representation, in the form of an image. The proposed system employs sophisticated computer vision techniques to detect and analyze graphical elements, subsequently utilizing a multi-modal LLM to translate these elements into an intermediate, language-agnostic format, which is then converted into the desired target language.

This research also encompasses the practical implementation of the solution, realized as a web application designed for internal usage at Blue Reply