polito.it
Politecnico di Torino (logo)

Data Quality and Observability for Data Mesh Paradigm

Francesco Lipani

Data Quality and Observability for Data Mesh Paradigm.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2023

Abstract:

This thesis aimed to create a data quality and observability framework for the data mesh paradigm. Since the combination of observability and data quality techniques ensures the fundamental concept of trustworthiness. The framework was developed to provide a guide for the entire data intelligence business area of NTT Data and to apply it to a case study using sample clickstream data. The work began with a detailed study of data quality techniques and processes, including the entire data lifecycle from collection to publication. In the context of the work described, libraries such as DataFold, Dbt tests, Great Expectations, and Deequ were evaluated to determine which one(s) could best meet the data quality needs of the case study. This evaluation involved considering factors such as the ease of use, the features offered, and the compatibility with the project's data sources and tools. Ultimately, a solution was developed that, through the identified best library, automates the creation of key performance indicators (KPIs) to assess data quality in ten dimensions (accuracy, semantics, consistency, etc.). The thesis delved into the topic of observability, analyzing available solutions, and proposing a practical solution based on a specific library that facilitated the collection of logs, metrics, and traces. This library offered the possibility to connect notebooks developed in PySpark language on Databricks and Azure Applications insights and create graphical dashboards and query tables to monitor pipeline execution.

Relatori: Paolo Garza
Anno accademico: 2022/23
Tipo di pubblicazione: Elettronica
Numero di pagine: 84
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-27 - INGEGNERIA DELLE TELECOMUNICAZIONI
Aziende collaboratrici: NTT DATA Italia
URI: http://webthesis.biblio.polito.it/id/eprint/26878
Modifica (riservato agli operatori) Modifica (riservato agli operatori)