polito.it
Politecnico di Torino (logo)

Automatic web crawler for malicious websites classification

Allan Brunstein

Automatic web crawler for malicious websites classification.

Rel. Marco Mellia, Rodolfo Vieira Valentim, Idilio Drago. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview
Abstract:

This thesis proposes the usage of a proactive Web Crawler to fight against cybercrimes, specially phishing and cybersquatting. According to the FBI, more than 300,000 citizens of the United States were victim to phishing scams in 2022, with a reported loss of 52 million United States Dollars. Criminals exploit the reputation of famous brands to promote false copycat websites, false virus and promotional messages and steal money and personal data from unsuspicious users. This tool collects data that can be later used to create evidence and notify authorities about the fraudulent activities, helping them block malicious websites more quickly. The proposed approach is to monitor on a daily basis a list of potentially harmful domains, collecting DNS records, SSL certificates and WHOIS information, as well as screenshotting the home page of each candidate. The system was developed in Python and C#, and made available in a Docker environment to facilitate reproducibility and scalability. The results obtained in testing attested efficiency, consistency and correctness of data. Finally, with the use of data visualization tools, this thesis aim to extract enough data and create labels for a classifier to be developed in the future with the ability of automatically labeling and classifying domains based on the crawled data, improving online cybersecurity and stopping frauds all over the world.

Relatori: Marco Mellia, Rodolfo Vieira Valentim, Idilio Drago
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 55
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/30844
Modifica (riservato agli operatori) Modifica (riservato agli operatori)