A Multimodal Visual Language Model for Musical Encoding

Alfredo Baione

A Multimodal Visual Language Model for Musical Encoding.

Rel. Giuseppe Rizzo. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Matematica, 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (92MB) | Preview

Abstract:	This thesis describes the main steps of a research experience conducted at LINKS foundation (Torino, Italy), regarding the implementation of a multimodal visual language model for musical encoding. The goal of this activity was testing the powerful tools of AI generative modelling, in order to better understand the potentials of machine learning in the art field. From this perspective, the thesis shows how, in particular, a denoising diffusion probabilistic model (DDPM) can be adopted to generate artistic images from a 30 second musical input. The data preparation (collection, categorization, normalization, etc.) as well as the research and the implementation of a suitable model have been the fundamental passages around which all the project was developed.
Relatori:	Giuseppe Rizzo
Anno accademico:	2023/24
Tipo di pubblicazione:	Elettronica
Numero di pagine:	121
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Matematica
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-44 - MODELLISTICA MATEMATICO-FISICA PER L'INGEGNERIA
Aziende collaboratrici:	FONDAZIONE LINKS
URI:	http://webthesis.biblio.polito.it/id/eprint/31587

Modifica (riservato agli operatori)