Politecnico di Torino (logo)

Offline reinforcement learning for hybrid vehicles energy consumption optimization

Federico Gambassi

Offline reinforcement learning for hybrid vehicles energy consumption optimization.

Rel. Francesco Vaccarino, Luca Sorrentino, Rosalia Tatano. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Matematica, 2022

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview

In the last decade, we have seen many practical applications of Reinforcement Learning (RL) to different practical tasks, that obtained great success. The fields of applications are various, ranging from robotics and autonomous driving, to AI for video games and strategic games like chess. In particular, a key element for the success of these methods is without doubt the integration of RL with Deep Learning. Indeed, thanks to the advances in terms of computational power, today we have the possibility of training neural deep approximators to learn patterns from unstructured data like images or text. However, very recently, a new paradigm has emerged from traditional RL, called Offline Reinforcement Learning (Offline RL), in which every interaction between the agent and the environment is prohibited, so that the agent to be trained can only learn from previously collected datasets. The necessity for this new branch of RL stems from the fact that, in many practical applications, learning from scratch in the real environment can be unfeasible, or even dangerous for the agent and the surroundings. The purpose of this work is to experiment with two offline RL algorithms, namely CQL and COMBO, on a problem concerning the energy consumption of a hybrid vehicle along a fixed trajectory. In particular, in order to assess the goodness of these methods we also fixed a suitable baseline, which is basically a plain adaptation of online DQN algorithm to offline RL. The training and testing of the three agents (including the baseline) was performed on a simulated environment developed by PoliTo and AddFor S.p.a. written in Python/Matlab. Regarding the training, datasets have been gathered using different online RL agents trained on the same framework. CQL is a model-free algorithm, meaning that the data is used just to learn how good are certain state-action couples; on the other hand, COMBO is a model-based algorithm, meaning that it first try to learn a representation of the outside world using the data available. Before showing the results, we provide a theoretical introduction to classic RL, as well as the theoretical foundation and motivation behind the algorithms used.

Relators: Francesco Vaccarino, Luca Sorrentino, Rosalia Tatano
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 81
Corso di laurea: Corso di laurea magistrale in Ingegneria Matematica
Classe di laurea: New organization > Master science > LM-44 - MATHEMATICAL MODELLING FOR ENGINEERING
Aziende collaboratrici: ADDFOR S.p.A
URI: http://webthesis.biblio.polito.it/id/eprint/21938
Modify record (reserved for operators) Modify record (reserved for operators)