polito.it
Politecnico di Torino (logo)

Optimizing ETL Processes: automation for SSIS Packages

Mihaita Andrei Boboc

Optimizing ETL Processes: automation for SSIS Packages.

Rel. Alessandro Fiori. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025

[img] PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB)
Abstract:

Extract, Transform, Load (ETL) processes are an essential part of building data warehouses. They allow data from many different systems to be collected, cleaned, and stored in a form that can be used for analysis. In Microsoft SQL Server Integration Services (SSIS), these processes are usually created by manually configuring packages. Although this approach provides flexibility, it is repetitive, slow and prone to errors, especially when the system must handle a large number of sources or when several developers are working on the same project. This thesis proposes a metadata driven automation framework written in Python to simplify this task. Instead of separately build each SSIS package by hand, the system utilizes metadata stored in structured, developed by hand, Excel files and automatically generates packages in XML format. The metadata describes source and target tables, grouping information and referential constraints. This information allows the system to generate packages for the staging layer (L0 level), the operational data storage layer (L1 level) and a pre load step, a special level used to check referential integrity between fact and dimension tables. The solution is built around reusable XML templates, which are filled dynamically by the Python scripts. In this way, the packages follow a common structure defined by the company’s integration framework. Tests carried out on real scenarios show that this method reduces development time, improves consistency, and lowers the risk of human error. Future changes in table structures or business rules can also be managed more easily, as they only require updates to the metadata files. The results obtained confirm that metadata driven automation can improve ETL development in SSIS, make it more reliable, easier to maintain and scalable.

Relatori: Alessandro Fiori
Anno accademico: 2025/26
Tipo di pubblicazione: Elettronica
Numero di pagine: 92
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela: Ecole Centrale de Lille (FRANCIA)
Aziende collaboratrici: Mediamente Consulting srl
URI: http://webthesis.biblio.polito.it/id/eprint/37831
Modifica (riservato agli operatori) Modifica (riservato agli operatori)