Federico Germinario
Identification of hard-coded secrets in GitHub through NLP-based scanners.
Rel. Giuseppe Rizzo. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023
Abstract
The exposure of hard-coded credentials inside source code is listed as one of the most dangerous vulnerabilities, due to the possibility for an attacker to gain unauthorized access to internal and external services. The automatic identification of secrets inside public and private repositories still represents a challenging problem to tackle, due to the different nature of credentials and snippets and the lack of specific and reliable benchmark datasets. Previous works have focused on the use of regular expressions and entropy-based approaches for the discovery of a limited number and specific structured strings with distinct formats such as API Keys, but ignoring unstructured credentials such as passwords.
We propose, with this work, an NLP-based solution to identify structured and unstructured hard-coded credentials in source code for various programming languages
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Informazioni aggiuntive
Corso di laurea
Classe di laurea
Ente in cotutela
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
