polito.it
Politecnico di Torino (logo)

Automation of ETL Pipelines in DataStage

Alex Umberto Benedetti

Automation of ETL Pipelines in DataStage.

Rel. Guido Albertengo. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2025

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview
Abstract:

In today’s business ecosystem for enterprise data management, the automation process of ETL (Extract Transform Load) pipelines has become one of the primary objectives. This master’s thesis explores the use of latest Artificial Intelligence techniques to simplify data integration and transformation, with the aim of minimizing the human effort in designing and managing workflows to process data. Conducted in partnership with Mediamente Consulting Srl, this research aims to design and implement a system that examines and utilizes advanced technologies to effectively manage user requests in an ETL data flow context. The central process involves the automation of DataStage components using XML templates to dynamically create and configure jobs based on the interpreted user requests. Through the development of custom scripts, the system automates the deployment and configuration of DataStage jobs, transforming the ETL setup from a manual, error-prone process into a more efficient and reliable automated procedure. It employs complex methods of data representation techniques designed to capture the semantic nuances and contextual elements present in the queries. These distributed representations represent the basis in order to finding the most appropriate ETL solution between a set of different available options presented. The selected solution is then analyzed by a generative model, again adapting it to the original specifications and thus enhancing the overall relevance and coherence of the final outcome. In developing the proposed pipeline, different embedding techniques and generative models were analyzed and tested. The most effective methods were selected based on their ability to provide answers that closely adapt to the user’s needs, as discussed in the thesis. The application of this methodologies yields an optimized ETL configuration which closely adapt to the user needs, minimizing the manual configuration in the ETL tasks. These type of operations not only improve operational efficiency but also allows enterprises to respond more dynamically to changing data requirements.

Relatori: Guido Albertengo
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 76
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-27 - INGEGNERIA DELLE TELECOMUNICAZIONI
Aziende collaboratrici: Mediamente Consulting srl
URI: http://webthesis.biblio.polito.it/id/eprint/35326
Modifica (riservato agli operatori) Modifica (riservato agli operatori)