Carlo Maria Negri
Machine Learning based credential code scanner.
Rel. Cataldo Basile, Antonio Lioy. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (2MB) | Preview |
Abstract: |
Data mining in public code platforms such as GitHub, has been widely used to compromise developer’s privacy. Due to inexperience or laziness, a lot of personal secret are left hardcoded in their source code which is then made public. There is no need for hacking skills to exploit this kind of vulnerability, sometimes is easy to find some credentials just using the GitHub search bar. In 2016 Uber had to sustain a massive data leak affecting 57 million customers that revealed sensitive information such as names, email address and phone numbers. The source of the attack was leaked credentials found in a private GitHub repository. All the existing tools are mostly regular expression based and don't achieve good results in terms of accuracy an low false positive rate. This work presents an approach to detect plain text leaks in open-source projects that uses Machine Learning to lower the amount of false positives. To preserve data privacy the tool was evaluated on several synthetic-data environments reaching good results in terms of leak detection. Finally a comparison between this approach and the existing one is done, emphasizing the adaptability of this solution. |
---|---|
Relatori: | Cataldo Basile, Antonio Lioy |
Anno accademico: | 2019/20 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 85 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Aziende collaboratrici: | SAP Labs France |
URI: | http://webthesis.biblio.polito.it/id/eprint/14463 |
Modifica (riservato agli operatori) |