polito.it
Politecnico di Torino (logo)

Toward a methodology for malware analysis and characterization for Machine Learning application

Andrea Sindoni

Toward a methodology for malware analysis and characterization for Machine Learning application.

Rel. Antonio Lioy, Andrea Atzeni. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (6MB) | Preview
[img] Archive (ZIP) (Documenti_allegati) - Altro
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (279kB)
Abstract:

In the last decades malware has been one of the major threats for IT systems, targeting both end users and organizations. Year after year malware samples evolve, showing new mechanisms to take advantage of their victims and developing new techniques to avoid detection. The analysis process is a fundamental task needed to perform both identification, i.e. labelling a program as benign or malicious, and family characterization, which means understanding which family a certain sample belongs to. A malware family is a group of samples that share very common characteristics or that have been developed by the same malicious actor. This thesis focuses on the development of an analysis and characterization methodology, trying to leverage on already developed tools that are able to extract representative information, i.e. features, from samples and trying to automate the extraction process as much as possible to later use the information obtained to perform characterization by preparing it to become a valid input for a Machine Learning system. In a first phase I tried to get an overview of which are the most widespread families nowadays, performing a selection of those families on which I should have pointed my attention. Then I performed a study on the state-of-the-art techniques and tools that are used to analyze malware samples, both to get an initial understanding of what are the general actions that should be monitored when analyzing a sample as well as to perform a selection of tools that can be used to analyze samples belonging to the selected families. Once the selection was made, I proceeded to find a way to collect samples belonging to those families I had selected, looking for all the available malware collections that were suitable for my purposes and combining them together to create my own collection. Later, I tried to find methods and tools that can be used to analyze these samples and extract information that depicts their behavior, building an analysis system that is capable, applying both static and dynamic analysis, of automatically extract features from samples. Despite I was able to extract many kinds of information, I had to select what features should be extracted since they had to be suitable for a Machine Learning system. This information has been extracted and prepared by building datasets of features, that can be used alone or even combined. Other than becoming just an input for ML, the extracted information has also been further analyzed to find any similar or different characteristics between samples.

Relatori: Antonio Lioy, Andrea Atzeni
Anno accademico: 2022/23
Tipo di pubblicazione: Elettronica
Numero di pagine: 99
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/26797
Modifica (riservato agli operatori) Modifica (riservato agli operatori)