polito.it
Politecnico di Torino (logo)

A configurable data platform for streaming delta and full data ingestion

Michele Gallina

A configurable data platform for streaming delta and full data ingestion.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview
Abstract:

The thesis project focuses on creating a cloud-based platform to manage large amounts of data in a secure, efficient, and dynamic way to meet current and future needs. The work was carried out using Apache Spark within Databricks and analyzing which framework best suited the various requirements. The platform is entirely cloud-based. Specifically, it was used Microsoft Azure. Being built using a cloud service allows for easy scaling, both up and down, to quickly respond to changes in data volume or adjust the processing time required. The use of Databricks provides a highly versatile platform based on Apache Spark, natively integrated with many frameworks to enable the creation of a system capable of meeting needs ranging from data ingestion and processing to the creation of complex dashboards and even the use of AI models. In particular, the thesis focused on creating a data platform for a security company to ingest data from two sources: a relational database and a network of IoT sensors. Once the data are stored on the platform, they undergo a quality improvement process to be made available to meet business needs. The platform was also designed to be as configurable as possible to make it easily extensible. Three company requirements were selected on the business side, and a solution was proposed for each.

Relatori: Paolo Garza
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 97
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: Cluster Reply Srl
URI: http://webthesis.biblio.polito.it/id/eprint/34023
Modifica (riservato agli operatori) Modifica (riservato agli operatori)