Politecnico di Torino (logo)

Enhancing Malware Classification Through LSTM Algorithm Integration in Binary Classification Models

Andrea Toscano

Enhancing Malware Classification Through LSTM Algorithm Integration in Binary Classification Models.

Rel. Alessandro Savino, Nicolò Maunero. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview

The latest trends in cybercrime show increasing damage to society and companies' economic assets by cyber attacks, most often conducted with the help of extremely versatile and effective attack tools called malware. Countermeasures taken to limit the impact of this threat often prove to be invalid, or at least can be circumvented through phishing techniques. This is why tireless research is needed to limit the advancement of such significant damage and such malicious techniques. The thesis explores the most widely used techniques for recognizing malicious programs through classification: the ability, from an unknown file, to recognize its characteristics and assign it an identifying label, in order to distinguish malware from safe programs. In addition, the purpose of this work, aims at the application of a type of Machine Learning to the field of malware classification, evaluating benefits and performance obtained from the help of Recurrent Neural Networks. Given the huge amount of malware found on a daily basis, another aspect that was given importance in the work done was the need to automate as much as possible the chain of extraction, behavioral analysis, and analysis by machine learning. The first proposed step sees an in-depth study regarding the most widely used techniques in the scientific literature for malware analysis, diversifying them according to the context in which the sample is studied: in fact, the main study methods used in static and dynamic analysis will be presented. Since the need to automate the process is a crucial part of the work, useful tools for automating the discovery of malicious samples and the extraction of key features from them will also be presented. The core of this research is based on the applicability of a Machine Learning algorithm, called Long Short-Term Memory, in analyzing certain features extracted from malware, which are essential based on their sequentiality. In fact, the algorithm will learn to recognize a malicious sample based on the sequential usage of system APIs that the Operating System provide to the running program. Starting with the composition of a data collection containing, one by one, the first 100 APIs called by each malware extracted from an automated type of analysis, the main settings necessary for the above algorithm to achieve the best possible performance will be identified. Therefore, the choices provided in the work performed, will be based on the objectivity of the tests performed with different parameters to evaluate its efficiency. Finally, a proof-of-concept and useful hints will be provided in order to facilitate the future work of those who wish to continue the type of research proposed.

Relators: Alessandro Savino, Nicolò Maunero
Academic year: 2023/24
Publication type: Electronic
Number of Pages: 78
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/30902
Modify record (reserved for operators) Modify record (reserved for operators)