polito.it
Politecnico di Torino (logo)

Transforming Data Flow: Generative AI in ETL Pipeline Automatization

Chiara Van Der Putten

Transforming Data Flow: Generative AI in ETL Pipeline Automatization.

Rel. Daniele Apiletti. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview
Abstract:

In the evolving landscape of enterprise data management, automating the creation of ETL pipelines emerges as a crucial objective. This master's thesis delves into employing state-of-the-art Artificial Intelligence techniques to streamline the integration and transformation of enterprise data, aiming to minimize the manual effort in developing data processing workflows. In partnership with Mediamente Consulting Srl, the study focuses on designing and implementing a system that efficiently addresses user requests within the ETL framework, leveraging cutting-edge technology. To this end, a tailored algorithm was designed to process user requests, employing sophisticated data representation techniques to encapsulate the semantic nuances and contextual cues embedded in these queries. This distributed representation of user requests serves as the basis for identifying the most suitable ETL solution from a repertoire of available options. Subsequently, the identified solution is refined through a generative model, which further aligns it with the original user specification, thereby improving the congruence and relevance of the final result. In the formulation of the proposed pipeline, a selected set of embedding techniques and generative models were evaluated and tested, culminating in the identification of the most efficient methodologies that could provide answers most attuned to user needs, as clarified in the thesis. This approach results in an initial ETL solution closely aligned with the user's needs, substantially reducing the manual work usually associated with creating ETL workflows, although not eliminating it.

Relatori: Daniele Apiletti
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 83
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: Mediamente Consulting srl
URI: http://webthesis.biblio.polito.it/id/eprint/30855
Modifica (riservato agli operatori) Modifica (riservato agli operatori)