polito.it
Politecnico di Torino (logo)

Ainur: Enhancing Vocal Quality through Lyrics-Audio Embeddings in Multimodal Deep Music Generation

Giuseppe Concialdi

Ainur: Enhancing Vocal Quality through Lyrics-Audio Embeddings in Multimodal Deep Music Generation.

Rel. Elena Maria Baralis, Eliana Pastor, Alkis Koudounas. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview
Abstract:

As an emerging research field, deep music generation faces significant challenges, such as handling high-dimensionality of audio data, computational resource requirements, and quality concerns, particularly with generated vocals. This study aims to address these concerns by introducing Ainur, an innovative deep learning model designed specifically to enhance the quality of generated vocals. We investigate the effectiveness of various deep learning techniques and multimodal input conditioning strategies to improve vocal generation. Additionally, the utility of transfer learning and pre-trained models is examined, along with the impact of multimodal input strategies on the quality and diversity of the produced music. Ainur employs a hierarchical diffusion model and a latent diffusion prior for handling high-dimensional data and uses Contrastive Lyrics-Audio Spectrogram Pre-training (CLASP) embeddings for multimodal data fusion. Our findings reveal Ainur's capability to produce high-quality and varied music, substantiating the use of our proposed novel evaluation metrics. The study also acknowledges the importance of ethical considerations and limitations inherent to deep music generation. Recognizing the potential implications of AI-generated music on creative integrity, and the potential misuse of such technology, we emphasize the need for responsible use. This work significantly contributes to the deep music generation field, establishing novel methodologies, offering robust tools, and providing directions for future research, while promoting collaboration and transparency through the open-source nature of Ainur.

Relators: Elena Maria Baralis, Eliana Pastor, Alkis Koudounas
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 200
Subjects:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Ente in cotutela: UNIVERSITY OF ILLINOIS AT CHICAGO (STATI UNITI D'AMERICA)
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/27647
Modify record (reserved for operators) Modify record (reserved for operators)