polito.it
Politecnico di Torino (logo)

Streamline machine learning projects to production using cutting-edge MLOps best practices on AWS

Alessandro Palladini

Streamline machine learning projects to production using cutting-edge MLOps best practices on AWS.

Rel. Daniele Apiletti. Politecnico di Torino, Corso di laurea magistrale in Data Science and Engineering, 2022

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview
Abstract:

In the early years of its life, the use of machine learning was limited to academic research, where it had the opportunity to evolve. In recent years the transition to the industrial world has taken place, and nowadays in any field especially in the largest companies machine learning is assuming an increasingly central role. This process comes with challenges: every technological evolution, which in some cases can also be disruptive, involves work and organizational adaptations. Moreover, gaps and differences demark the distance among how academic research and real-world companies work. The objective of machine learning operations (MLOps) is to shorten but also make more reliable the major phases that characterise the deployment and maintenance of machine learning model in production. It inherits from DevOps some of its key practices like continuous integration, continuous deployment and delivery while introducing practices unique to machine learning systems, like continuous training and data versioning. In this work we built a MLOps framework and practices using AWS services, that can be used in various settings, and by different teams from different backgrounds, which broadens the horizons of MLOps, including in the framework new innovative strategies, such as monitoring pipelines and the usage of feature store. The MLOps pipeline is the main pillar of our work: we were able to build an highly modularized and reliable pipeline that automates the entire machine learning lifecycle, bridging the gap between development and operation and enabling better collaboration and communication between different teams who are operating in the system. The main difficulty and the main focus in the first phase of our work was to develop the pipeline, once it was ready we could focus on the pure machine learning experimentation phase of the different projects. This allow to deliver a model to production much faster from its conception. The two projects we present are in different setting: a classic binary classification and a cutting edge time series forecasting, which is not present in literature yet. In both the projects we show how the continuously train and deliver of models in production work: the monitoring pipeline checks at regular intervals whether drift between the statistics of the training dataset and serving data are present or model performances decay in terms of predefined metrics.

Relators: Daniele Apiletti
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 76
Subjects:
Corso di laurea: Corso di laurea magistrale in Data Science and Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: DATA Reply S.r.l. con Unico Socio
URI: http://webthesis.biblio.polito.it/id/eprint/22607
Modify record (reserved for operators) Modify record (reserved for operators)