Gpu accelerated ETL processes: a faster way to deal with big data

Edoardo Lardizzone

Gpu accelerated ETL processes: a faster way to deal with big data.

Rel. Daniele Apiletti. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

Abstract

The world is in constant evolution and in the last decade data have become the most valuable assets in economics and research, their use is essential for a company which wants to keep up with the times but, since everyday there are new methods to extract and analyze data their mole is getting bigger and bigger and the old frameworks are beginning to be obsolete in terms of times of execution. The goal of this work is to check the state of the art regarding the use of the GPU in a data pipeline, the focus is on the ETL part of the framework because the exploitation of these machines for the Machine Learning part has already been taken on the next level while the preprocessing phase is still mainly done using CPUs.

Since the usage of the GPU to accelerate the Deep Learning phase has been a an argument of discussion for many years and very good technique have been discovered already I do not refer to them because it would be a waste of time and it would be out of my intents