Nima Shamandi Honejani
Utilizzo di strumenti basati sull'intelligenza artificiale per la conservazione e l'analisi del patrimonio architettonico Persiano = Leveraging artificial intelligence tools for the preservation and analysis of Persian architectural heritage.
Rel. Gianvito Urgese. Politecnico di Torino, Corso di laurea magistrale in Digital Skills For Sustainable Societal Transitions, 2024
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (72MB) | Preview |
Abstract: |
This study examines the potential of new artificial intelligence (AI) applications in the field of Persian architectural heritage. In this study, we utilize advanced generative models to utilize Persian architectural features, which are renowned for their intricate ornamentation, as a case study scenario for the preservation of such intricate designs. For this purpose, we compiled about 9,000 images from Wikimedia Commons, covering 90 traditional architectural sites in 13 cities in Iran. In our case, these images were enriched with granular metadata to provide rich contextual information that further assists the training of generative algorithms. This included enriching the metadata with specific architectural features of each picture, its style, and period, as well as offering a comprehensive context for every image. This precise method made the dataset more extensive and informative, helping generative models mimic Persian architecture with much better accuracy in a finer-grained level of detail. We pursued a two-experiment method to test the performance of each AI tool, which included DALL·E-3, RunwayML, and Stable Diffusion. The initial experiment aimed at generating images simply by text descriptions from architectural features of the dataset. They included detailed descriptions to convey characteristics of Persian architectural elements without referring back to the sample images. In this experiment, the image-generation capabilities of these architectures are also tested; their abilities could be measured in making alignment and fidelity pictures based on a prompt description. We provided textual descriptions and source images (from the dataset) to AI tools for the second experiment. This was designed to better direct the models on a broad input, making them capable of incorporating visual cues with text context, which would then help provide an accurate image embedded within its context. Adding text and image inputs together was an anticipated way to boost the model's performance, as it would allow for a better understanding of architectural elements. The image was evaluated using alignment, fidelity, and visual quality parameters. To give a well-rounded, robust evaluation of the models' output, we used multiple assessment metrics - CLIPScore and GPT-4o observations as written highlights with human evaluations. CLIPScore quantified how well the generated images fit with descriptions, where GPT-4o assessed coherence and relevance. Human evaluation was indispensable to determine whether the generated images were visually appropriate and involved in context, not just correctly output by models. These results indicate that DALL·E-3 is a state-of-the-art image synthesizer, offering higher-quality images with the highest product fidelity relative to other tools. The observational model, DALL-E 3, generated highly detailed and culturally relevant images that were always closely aligned to the descriptions and references provided by humans, reflecting its mature state as a generative modeling framework. Thus, following up human reviews with automated assessments can offer a fuller picture of whether or not the model's performance in generating images meets technical and cultural standards. |
---|---|
Relatori: | Gianvito Urgese |
Anno accademico: | 2023/24 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 138 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Digital Skills For Sustainable Societal Transitions |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-91 - TECNICHE E METODI PER LA SOCIETÀ DELL'INFORMAZIONE |
Aziende collaboratrici: | Politecnico di Torino |
URI: | http://webthesis.biblio.polito.it/id/eprint/31660 |
Modifica (riservato agli operatori) |