Giuseppe Concialdi
Ainur: Enhancing Vocal Quality through Lyrics-Audio Embeddings in Multimodal Deep Music Generation.
Rel. Elena Maria Baralis, Eliana Pastor, Alkis Koudounas. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (5MB) | Preview |
Abstract: |
As an emerging research field, deep music generation faces significant challenges, such as handling high-dimensionality of audio data, computational resource requirements, and quality concerns, particularly with generated vocals. This study aims to address these concerns by introducing Ainur, an innovative deep learning model designed specifically to enhance the quality of generated vocals. We investigate the effectiveness of various deep learning techniques and multimodal input conditioning strategies to improve vocal generation. Additionally, the utility of transfer learning and pre-trained models is examined, along with the impact of multimodal input strategies on the quality and diversity of the produced music. Ainur employs a hierarchical diffusion model and a latent diffusion prior for handling high-dimensional data and uses Contrastive Lyrics-Audio Spectrogram Pre-training (CLASP) embeddings for multimodal data fusion. Our findings reveal Ainur's capability to produce high-quality and varied music, substantiating the use of our proposed novel evaluation metrics. The study also acknowledges the importance of ethical considerations and limitations inherent to deep music generation. Recognizing the potential implications of AI-generated music on creative integrity, and the potential misuse of such technology, we emphasize the need for responsible use. This work significantly contributes to the deep music generation field, establishing novel methodologies, offering robust tools, and providing directions for future research, while promoting collaboration and transparency through the open-source nature of Ainur. |
---|---|
Relatori: | Elena Maria Baralis, Eliana Pastor, Alkis Koudounas |
Anno accademico: | 2022/23 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 200 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Ente in cotutela: | UNIVERSITY OF ILLINOIS AT CHICAGO (STATI UNITI D'AMERICA) |
Aziende collaboratrici: | NON SPECIFICATO |
URI: | http://webthesis.biblio.polito.it/id/eprint/27647 |
Modifica (riservato agli operatori) |