Ferdinando Micco
Design and implementation of data science pipelines: a new paradigm based on analytics engineers.
Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (3MB) | Preview |
Abstract: |
Data represents an increasingly critical strategic asset for companies of all sectors and sizes. Without a solid foundation of Analytics engineering, one risks having poor quality data, manual and fragmented processes, unreliable analysis, and long delivery times. Fortunately, there are tools that help implement the best Analytics engineering practices efficiently and at scale. One of these is dbt (data build tool), an open-source platform that simplifies the transformation, documentation, and testing of data models. The main focus of the thesis is to implement a modern pipeline solution that incorporates all best practice of analytics engineering. The inclusion of an analytics engineer within a data team represents a new paradigm in data-driven organizations. The study aims to show the feasibility of such a solution and the potential improvements of adopting such a solution in terms of increased efficiency, higher quality data, and faster time to insights. Moreover, this project has served as the starting point for a collaboration with a company that has specific requirements in the area of data quality. The collaboration has provided valuable insights into the practical implementation of the pipeline solution and has helped tailor the approach to address the company's data quality needs. The proposed solution will involve the use of cutting-edge tools and techniques to transform, document, and test data models, such as dbt. The whole architecture will be implemented serverless on a cloud computing system to provide the required elasticity, scalability, and cost-effectiveness. The improved reliability of data analysis, coupled with the faster time to insights, will allow organizations to make data-driven decisions more quickly and confidently. |
---|---|
Relators: | Paolo Garza |
Academic year: | 2022/23 |
Publication type: | Electronic |
Number of Pages: | 60 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Aziende collaboratrici: | DATA Reply S.r.l. con Unico Socio |
URI: | http://webthesis.biblio.polito.it/id/eprint/27737 |
Modify record (reserved for operators) |