polito.it
Politecnico di Torino (logo)

Anomaly Detection In The Enterprise Context Through Log Analysis

Alessandro Bozzella

Anomaly Detection In The Enterprise Context Through Log Analysis.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview
Abstract:

The research work, carried out within the business unit of Technological Infrastructure of the Mediamente Consulting Srl company, is born as an integration of the previous work on the "intelligent monitoring" already started in the company and also in the field of university research in cooperation with the Politecnico di Torino thanks to a previous thesis project. If the aforementioned project analyzed the possibility of predicting possible system faults through the application of predictive models based on numerical data from the Enterprise Manager monitoring software, with the study in question instead the focal point is on the possibility of knowing with the available data whether the status of the system has anomalies or not through the detection of anomalies thanks to the analysis of system logs. Specifically the activity conducted by the business unit of Technological Infrastructure relates in particular to the sphere of Oracle technologies and products, the company's area of specialization. For these reasons and for the fact that there was the need of identifying the most used system log in the area of troubleshooting the largest part of customer problems, the log file of a relational database management system has been decided to consider: this is represented by a customer's Oracle Database instance. Furthermore the reason for limiting the analysis to a real and precise context is to be found in the interest of evaluating the validity of the analysis considering realistic values. In reference to this, the framework is made up of several parts: log collection, log parsing, feature extraction and anomaly detection. The first phase is essential to fetch the material to be analyzed, while the second aims to process the log file in such a way as to put it in a structured and well-defined form in order to facilitate the work of the subsequent phases. After these first two phases, there is the feature extraction, which consists in extracting numerical feature vectors from textual fields of the log file. The fourth and last phase is substantially the most important of all as it constitutes the central part of this work: as its name suggests, it consists in identifying any anomalies present in the analyzed log file. Therefore, the numerical data obtained in the previous phase serve as a starting point for applying thereafter some machine learning algorithms: here the choice fall on using algorithms belonging to both supervised and unsupervised learning in order to see which of the two types of learning was the most suitable to carry out the assigned task. For this purpose, three widely used performance metrics were used to evaluate the accuracy of the anomaly detection methods both supervised and unsupervised, varying also a series of parameters and the validation strategy adopted (for what concerns the supervised analysis). The objective of the analysis is, in summary, to evaluate the effectiveness of the models examined, determining whether they are able to predict with a reasonable level of accuracy the current state of the system under examination (normal or abnormal) in terms of performance metrics and to provide references for future developments.

Relatori: Paolo Garza
Anno accademico: 2020/21
Tipo di pubblicazione: Elettronica
Numero di pagine: 60
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: Mediamente Consulting srl
URI: http://webthesis.biblio.polito.it/id/eprint/16603
Modifica (riservato agli operatori) Modifica (riservato agli operatori)