Autoscaling mechanisms for Google Cloud Dataproc

Luca Lombardo

Autoscaling mechanisms for Google Cloud Dataproc.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2019

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (2MB) | Preview

Abstract

In 2012 the Harvard Business Review article affirmed the Data Scientist profession as "The Sexiest Job of the 21st Century". We all know the story so far: the Big Data movement took over and the demand for this new position rapidly increased. Today all the companies try to squeeze their large amount of data to gain new in- sights and improve their businesses. All the Cloud Services providers, like Google and Amazon, met this market demand: nowadays it is really easy for a company, and specifically who is in charge to analyze data, to create a Hadoop cluster on the fly where deploying Spark jobs, only a matter of minutes.

Unfortunately, it is not all so easy as it seems