Francesco Lipani
Data Quality and Observability for Data Mesh Paradigm.
Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2023
Abstract: |
This thesis aimed to create a data quality and observability framework for the data mesh paradigm. Since the combination of observability and data quality techniques ensures the fundamental concept of trustworthiness. The framework was developed to provide a guide for the entire data intelligence business area of NTT Data and to apply it to a case study using sample clickstream data. The work began with a detailed study of data quality techniques and processes, including the entire data lifecycle from collection to publication. In the context of the work described, libraries such as DataFold, Dbt tests, Great Expectations, and Deequ were evaluated to determine which one(s) could best meet the data quality needs of the case study. This evaluation involved considering factors such as the ease of use, the features offered, and the compatibility with the project's data sources and tools. Ultimately, a solution was developed that, through the identified best library, automates the creation of key performance indicators (KPIs) to assess data quality in ten dimensions (accuracy, semantics, consistency, etc.). The thesis delved into the topic of observability, analyzing available solutions, and proposing a practical solution based on a specific library that facilitated the collection of logs, metrics, and traces. This library offered the possibility to connect notebooks developed in PySpark language on Databricks and Azure Applications insights and create graphical dashboards and query tables to monitor pipeline execution. |
---|---|
Relators: | Paolo Garza |
Academic year: | 2022/23 |
Publication type: | Electronic |
Number of Pages: | 84 |
Additional Information: | Tesi secretata. Fulltext non presente |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro) |
Classe di laurea: | New organization > Master science > LM-27 - TELECOMMUNICATIONS ENGINEERING |
Aziende collaboratrici: | NTT DATA Italia |
URI: | http://webthesis.biblio.polito.it/id/eprint/26878 |
Modify record (reserved for operators) |