polito.it
Politecnico di Torino (logo)

Log Mining for Failure Analysis on Spark

Marco Angius

Log Mining for Failure Analysis on Spark.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2019

Abstract:

In the past few years Amadeus has decided to rely on Big data platforms to develop advanced analytics modules and help airlines to improve their monitoring on performance insights. Big data platforms have matured since the introduction of Hadoop more than 10 years ago. Today Apache Spark represents the next-generation of data processing frameworks, providing ad- vanced in-memory capabilities and a directed acyclic graph (DAG) engine. Spark was developed to address the limitation of MapReduce, being 10–100 times faster in most of the data processing workloads. Nevertheless, Spark presents cases of application failures which are very difficult to interpret and, therefore, correct. This work presents a deep predictive data analysis per- formed on Spark application logs in order to discover failure patterns and speed up the issue resolutions. By studying the Spark framework and some recent scientific publications on the subject, three data sets with different granularity have been investigated: Job, Stage and Task. This real use case presented a challenging class imbalance scenario in the data, with the fail- ures representing the minority class with respect to the success. The report presents some recent techniques for data set rebalancing in order to perform binary classification tasks by means of well known machine learning models such as Random Forest and Neural Networks. The results of this exploratory analysis helped in the resolution of technical and functional issues, improving resource optimisation for large-scale data processing applications.

Relatori: Paolo Garza
Anno accademico: 2018/19
Tipo di pubblicazione: Elettronica
Numero di pagine: 86
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-27 - INGEGNERIA DELLE TELECOMUNICAZIONI
Ente in cotutela: EURECOM - Telecom Paris Tech (FRANCIA)
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/10960
Modifica (riservato agli operatori) Modifica (riservato agli operatori)