Politecnico di Torino (logo)

Machine Learning based credential code scanner

Carlo Maria Negri

Machine Learning based credential code scanner.

Rel. Cataldo Basile, Antonio Lioy. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview

Data mining in public code platforms such as GitHub, has been widely used to compromise developer’s privacy. Due to inexperience or laziness, a lot of personal secret are left hardcoded in their source code which is then made public. There is no need for hacking skills to exploit this kind of vulnerability, sometimes is easy to find some credentials just using the GitHub search bar. In 2016 Uber had to sustain a massive data leak affecting 57 million customers that revealed sensitive information such as names, email address and phone numbers. The source of the attack was leaked credentials found in a private GitHub repository. All the existing tools are mostly regular expression based and don't achieve good results in terms of accuracy an low false positive rate. This work presents an approach to detect plain text leaks in open-source projects that uses Machine Learning to lower the amount of false positives. To preserve data privacy the tool was evaluated on several synthetic-data environments reaching good results in terms of leak detection. Finally a comparison between this approach and the existing one is done, emphasizing the adaptability of this solution.

Relators: Cataldo Basile, Antonio Lioy
Academic year: 2019/20
Publication type: Electronic
Number of Pages: 85
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: SAP Labs France
URI: http://webthesis.biblio.polito.it/id/eprint/14463
Modify record (reserved for operators) Modify record (reserved for operators)