
Babelle Tchoumi Yomi
Development and Orchestration of a Scalable and Efficient Automated Data Ingestion Workflows and Pipelines for Multi-Domain at MSC Technology Italia.
Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2025
![]() |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (5MB) |
Abstract: |
This thesis was conducted at MSC Technology Italia as part of an enterprise-wide initiative to modernize and automate data ingestion processes across key business domains, including Finance, CRM, Logistics, Operations, and Liners. In the current hybrid architecture, Informatica PowerCenter is used to orchestrate ingestion workflows from various structured data sources. Azure Synapse Analytics supports the design and execution of cloud-native data pipelines, particularly for systems based on Oracle. This division reflects MSC’s progressive shift from traditional ETL to scalable, cloud-based processing. The project also aligns with MSC’s strategic objective to migrate toward Microsoft Fabric, a unified analytics platform built on a lakehouse model. As part of this transition, Fabric pipelines were developed using notebooks, PySpark, Script activities, and Copy activities to demonstrate modern ingestion and transformation capabilities. Its seamless integration with Power BI further enhances real-time analytics and self-service reporting. To improve efficiency and reduce redundancy, an incremental loading mechanism based on watermarks, was implemented. It ensures that only new or updated records are processed, significantly optimizing system performance. The final result is a flexible and scalable data ingestion framework that integrates PowerCenter, Azure Synapse, and Microsoft Fabric for ingestion; Dagster for validation and reconciliation; and Automic for scheduling and monitoring. CI/CD pipelines were also introduced via Azure DevOps to automate deployments and ensure version consistency across environments. This architecture enables MSC Technology Italia to progressively migrate toward a modern, unified data platform while maintaining operational continuity, performance, and trust in data processes. |
---|---|
Relatori: | Paolo Garza |
Anno accademico: | 2024/25 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 77 |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Aziende collaboratrici: | MSC TECHNOLOGY (ITALIA) S.R.L. |
URI: | http://webthesis.biblio.polito.it/id/eprint/36349 |
![]() |
Modifica (riservato agli operatori) |