Politecnico di Torino (logo)

Generative Adversarial Networks for Emotion-based Music Generation

Emanuele Aiello

Generative Adversarial Networks for Emotion-based Music Generation.

Rel. Cristina Emma Margherita Rottondi. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2021

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview

Nowadays, generative adversarial networks (GANs) have proven themselves to be capable of creating hyper-realistic faces, animating paintings, colorizing sketches, and so on. However, these models can handle not only images but also text and audio. The context of this thesis is the research and exploration of Generative Adversarial Networks applied to music generation. The objective outcome is to create melodies that elicit a specific emotion in the listener. This is known as affective music composition. Emotions are an important aspect of music, and the ability to regulate this characteristic might find a variety of applications in generating soundtracks or melodies appropriate to different types of domains. Various algorithms have been created to generate music with a specific emotion. One of these systems uses genetic algorithms and advanced evolutionary approaches to produce real-time music capable of expressing various emotional states. Others make use of Long Short-Term Memory (LSTM) networks or ad hoc architecture based on Recurrent Convolutional Neural Networks (R-CNN). However, to the best of my knowledge, the usage of GANs in this field is largely unexplored. To accomplish this goal, the state of the art in symbolic music generation for multitrack polyphonic melodies, which is called MuseGAN, has been identified, and starting from it a modified architecture has been created. In this network, there is no control over the characteristics of the generated pieces, in particular since emotion is a very complex feature, achieving such control is very challenging. Symbolic music generation consists in generating a representation of notes and sounds to be played by an instrument or synthesized by a computer. The most prevalent symbolic format is MIDI; nevertheless, there is only one dataset that contains MIDI files tagged with emotions, it is called VG MIDI and contains piano pieces made by video games soundtracks, 200 of which labeled with an associated emotion and 3000 unlabeled. The major difficulty in this project lies in the availability of data because GANs require a large number of samples to be successfully trained. Three different approaches have been tested to achieve the desired goal. The first one is based on transfer learning: a GAN model is trained in an unsupervised manner, after this training, the architecture is changed to that of a Conditional GAN and the process of training continues with the labeled dataset. The second experiment consists in exploring the latent space to control the generation. An emotion classifier, based on convolutional layers, is trained with the labeled dataset. The classifier is used with "stochastic gradient ascent" to obtain the latent direction that maximizes the amount of a wanted feature, in this case, the requested emotion. Finally, the last experiment uses the unconditioned GAN model to generate random melodies and the classifier to detect the target mood, in this approach melodies are generated until the target mood is found. The models have been evaluated by calculating the statistical difference over several musical metrics between the generated and real samples. Furthermore, because emotions are very subjective, a survey has been constructed to qualitatively evaluate the generation process.

Relators: Cristina Emma Margherita Rottondi
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 77
Corso di laurea: Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro)
Classe di laurea: New organization > Master science > LM-27 - TELECOMMUNICATIONS ENGINEERING
Aziende collaboratrici: FONDAZIONE LINKS
URI: http://webthesis.biblio.polito.it/id/eprint/20613
Modify record (reserved for operators) Modify record (reserved for operators)