Aurora Leone
Design and Implementation of a Metadata-Driven Enterprise ETL Framework on Databricks.
Rel. Daniele Apiletti. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2026
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (4MB) | Preview |
Abstract
Modern enterprises require unified platforms that integrate data engineering, governance, and analytics to support strategic decision-making. This thesis evaluates Databricks as a cloud-native environment for implementing and extending a structured, metadata-driven ETL framework developed by Data Science Operations. The framework follows a modular three-layer architecture: raw data ingestion (L0), integration and quality control (L1), and analytical publication (L2). It was fully re-engineered on Databricks using Delta Lake for transactional consistency and Databricks Jobs for orchestration. Metadata-driven automation, dependency-aware scheduling, and comprehensive auditing mechanisms ensure traceability, reproducibility, and operational efficiency across the entire pipeline. Beyond traditional ETL operations, the study demonstrates integration with advanced analytics workflows.
A machine learning model was developed to estimate the probability of credit risk deterioration, utilizing curated datasets produced by the framework
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
