Politecnico di Torino (logo)

Data contracts as a quality enforcement tool under a Data Mesh architecture

Sergio Andres Mejia Tovar

Data contracts as a quality enforcement tool under a Data Mesh architecture.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

[img] PDF (Tesi_di_laurea) - Tesi
Restricted to: Repository staff only until 27 October 2024 (embargo date).
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (6MB)

Data contracts are tools that are gaining strength in the data engineering practice, used to enhance data quality during data ingestion. This thesis aims to establish a format specification for contracts and the associated system for enforcing these agreements, particularly targeting source-aligned data products in the context of a data mesh paradigm. This research addresses the pressing need in contemporary data engineering to maintain data quality amid an increasing production of data, proposing a proactive approach where data producers adhere to predefined quality standards at the source. For this purpose, an application was designed to manage and validate contracts, using a push-based data ingestion as the examined scenario. The study analyzes existing proposals for data contracts and introduces the creation of a contract object that facilitates quality agreements between data producers and consumers. These agreements rely on declarative schema and quality rules, used to computationally generate validations to ensure full alignment and compliance. The system has been designed to incorporate a declarative finite state machine workload management system in order to offer flexibility in adapting data validation processes to diverse use cases. The research and subsequent software implementation demonstrate the delicate balance in defining a data contract between general standardization and domain-specific quality expectations, which in turn heavily influences the design of the enforcement system to account for this demand of configurability. Nonetheless, by analyzing the implemented system it can be concluded that despite the time overhead as a result of the enforcement, data contract agreement and enforcement enable faster correction producer-side, ultimately elevating data quality for consumers. This thesis has been developed as part of the Research & Development unit of Agile Lab s.r.l., with plans to integrate it into Witboost, Agile Lab main product.

Relators: Paolo Garza
Academic year: 2023/24
Publication type: Electronic
Number of Pages: 102
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: Agile Lab S.r.l.
URI: http://webthesis.biblio.polito.it/id/eprint/28473
Modify record (reserved for operators) Modify record (reserved for operators)