Chiara Van Der Putten
Transforming Data Flow: Generative AI in ETL Pipeline Automatization.
Rel. Daniele Apiletti. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (5MB) | Preview |
Abstract: |
In the evolving landscape of enterprise data management, automating the creation of ETL pipelines emerges as a crucial objective. This master's thesis delves into employing state-of-the-art Artificial Intelligence techniques to streamline the integration and transformation of enterprise data, aiming to minimize the manual effort in developing data processing workflows. In partnership with Mediamente Consulting Srl, the study focuses on designing and implementing a system that efficiently addresses user requests within the ETL framework, leveraging cutting-edge technology. To this end, a tailored algorithm was designed to process user requests, employing sophisticated data representation techniques to encapsulate the semantic nuances and contextual cues embedded in these queries. This distributed representation of user requests serves as the basis for identifying the most suitable ETL solution from a repertoire of available options. Subsequently, the identified solution is refined through a generative model, which further aligns it with the original user specification, thereby improving the congruence and relevance of the final result. In the formulation of the proposed pipeline, a selected set of embedding techniques and generative models were evaluated and tested, culminating in the identification of the most efficient methodologies that could provide answers most attuned to user needs, as clarified in the thesis. This approach results in an initial ETL solution closely aligned with the user's needs, substantially reducing the manual work usually associated with creating ETL workflows, although not eliminating it. |
---|---|
Relators: | Daniele Apiletti |
Academic year: | 2023/24 |
Publication type: | Electronic |
Number of Pages: | 83 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Aziende collaboratrici: | Mediamente Consulting srl |
URI: | http://webthesis.biblio.polito.it/id/eprint/30855 |
Modify record (reserved for operators) |