Politecnico di Torino (logo)

Characterization and forecast of online music consumption dynamics.

Gianluca Boni

Characterization and forecast of online music consumption dynamics.

Rel. Alfredo Braunstein, Vittorio Loreto. Politecnico di Torino, Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi), 2020

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview

A data-set of 2251 songs released between 2020/01/15 and 2020/03/24 is analyzed to formulate short and long term predictions of popularity using machine learning techniques. These public data were collected from Spotify, the largest subscription music streaming service with 96 million subscribers and 170 million users overall, and from YouTube, one of the most popular online video-sharing platforms. The first part of the work has a purely predictive character and makes extensive use of machine learning, providing results of various kinds (classification, regression, etc.). In particular, we detect the features which are the most informative for characterizing the Spotify Popularity, an integer number in a range between 0 and 100 indicating the success of the track inside the streaming-platform. Also, given a track and its related video on YouTube, we build a neural network for inferring the number of views at a given day taking into account all the information available from Spotify. As last, the converse analysis is performed, i.e. the Spotify popularity is predicted taking as input some of the YouTube statistics. Although the good quality of the predictions given in the first part of the work, the brutal use of machine learning algorithms does not allow to go beyond the mere prediction. In the second part of the work, therefore, we want to shed light on the interconnections present between the different time-series that make up the data-set. Thus, we point to quantify the amount of information transferred from the YouTube views time-series to the Spotify popularity time-series, and viceversa. The final result is a directed graph showing the flows of information between all the time-series analyzed. The main achievements, as well as the drawbacks of the models adopted are discussed.

Relators: Alfredo Braunstein, Vittorio Loreto
Academic year: 2020/21
Publication type: Electronic
Number of Pages: 81
Corso di laurea: Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi)
Classe di laurea: New organization > Master science > LM-44 - MATHEMATICAL MODELLING FOR ENGINEERING
Ente in cotutela: SONY CSL Paris (FRANCIA)
Aziende collaboratrici: Sony Europe BV
URI: http://webthesis.biblio.polito.it/id/eprint/15960
Modify record (reserved for operators) Modify record (reserved for operators)