polito.it
Politecnico di Torino (logo)

Web UI code generation: a transformer-based model applied to real-world screenshots

Giuseppe Salvi

Web UI code generation: a transformer-based model applied to real-world screenshots.

Rel. Luigi De Russis, Tommaso Calo'. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (10MB) | Preview
Abstract:

The automatic code generation of Web User Interfaces (UIs) promises to be a revolutionary step in the digital landscape, bridging the gap between design and development. This technology involves extracting source code from a website's visual representation. It makes website creation more accessible, removing any coding knowledge requirement. This process is non-trivial due to the intrinsic nature of HTML/CSS languages. The complex interactions between code and graphic elements enable multiple ways to achieve the same visual result. Moreover, a single line of code can significantly influence the appearance of numerous components or the entire website. Recently, deep neural network models, particularly transformer-based ones, have demonstrated their ability to process and understand long texts without losing the overall meaning. They also showed remarkable results in tasks requiring complex logical skills, such as code generation. This new technological frontier opened new possibilities in the challenge of automatic code generation of Web UIs. Due to their dimensions and architecture, these models require vast amounts of diverse and high-quality data to generalize and be effective with new inputs. Synthetic datasets, limited by the finite combinations of their generating elements, tend to have a smaller size and display samples with linear distinct patterns. For this reason, the thesis investigated new techniques for scraping Web UIs directly from real-world websites. I developed an automatic tool to process a list of websites and obtain their home page screenshots and source codes, minimizing noise and keeping only the code closely related to the interface represented. It not only retrieves HTML/CSS codes from websites but also corrects errors, replaces tags and attributes, removes comments, and adjusts code formatting. Furthermore, the script removes HTML tags and CSS rules that do not impact the visual appearance of the website screenshot. These operations significantly reduce CSS file lines by an average of 90% and HTML by 18%. After substituting all image references with a default one to minimize website resources, the screenshots are extracted. The process continues through a framework detector and a classifier trained on human-evaluated screenshots, to discard poor results. Using this tool, a dataset of over 34000 samples has been created from websites featured in the Majestic Million, a ranking of domains by most referred subnets. Finally, I fine-tuned Google's Pix2Struct to predict code from website screenshots. It is an image-to-text transformer-based model designed to process images with visually situated language and produce structured text. I demonstrated its effectiveness on a simple synthetic Web UI dataset called Pix2Code, where it consistently predicted accurate outputs. After introducing some modifications to process longer texts, I further optimized the model using a new synthetic Bootstrap dataset, a variant with sketches, and our newly created dataset. The results proved Pix2Struct's capability in handling complex website code generation tasks.

Relatori: Luigi De Russis, Tommaso Calo'
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 137
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/28535
Modifica (riservato agli operatori) Modifica (riservato agli operatori)