Matteo Donadio
Declarative Data Pipelines: implementing a logical model through automated code generation.
Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (3MB) | Preview |
Abstract
The design and operation of data pipelines that deal with the extraction, transformation, and storage of large data sets are crucial in the field of data engineering. This thesis, developed in collaboration with Agile Lab S.R.L, introduces a logic model aimed at establishing a clear and standardized approach to data pipeline architecture, providing a structured framework for defining entities, their interrelationships, and the operational rules essential for building effective and reliable data pipelines. To bridge the gap between theoretical models and practical implementation, a tool that automates the generation of executable code for data pipelines, designed to work independently of specific data management tools, has also been implemented.
It takes advantage of a declarative programming approach, allowing it to generate Python code for Apache Airflow, while maintaining the flexibility to adapt to other technologies as needed
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
