Offline Reinforcement Learning for Smart HVAC Optimal Control

Filippo Cortese

Offline Reinforcement Learning for Smart HVAC Optimal Control.

Rel. Francesco Vaccarino, Luca Sorrentino. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (23MB) | Preview

Abstract:	Deep Reinforcement Learning provides a mathematical formalism for learning-based control. It presents an agent that, by a trial and error approach, learns how to behave optimally in an environment. Deep Reinforcement Learning has in this online learning paradigm one of the biggest obstacle to its widespread adoption. In many settings the interaction between the agent and the environment is either impractical or too dangerous, for example in the healthcare or autonomous driving domain. Offline Reinforcement Learning tries to overcome this issue by proposing a new paradigm, where the learning happens from a fixed batch of previously collected data. Removing the online interaction makes this data-driven approach scalable and practical but introduces also some issues for the learning process. The first is that learning rely completely on the static dataset composition, if this does not cover enough high reward regions, it may be impossible for the agent to learn how to behave optimally. The second is the out of distribution actions overestimation. Actions that are never seen in the data are keen to be overestimated by the agent, that without the reward feedback, can’t correct its wrong estimates. This thesis aims at studying in depth the Offline RL approaches with a focus on algorithms that do minimal changes to state-of-the-art deep RL algorithms. Then it will focus on evaluating this approach on a real-case scenario like the smart HVAC control, where the data available is either limited in size or in exploration. To pursuit these objectives we started from a state of the art continuous-action offline RL algorithm, called TD3-BC, and derived a discrete-action algorithm that we call TD4-BC. We compared the two algorithms on a dual action nature environment called LunarLander and tested TD4-BC on the smart HVAC control task. Finally, an additional online fine-tuning approach to TD4-BC is tested on the HVAC environment. The obtained results show comparable performance for TD4-BC with respect to TD3-BC on LunarLander and promising results on the HVAC task, especially with the addition of online fine-tuning. Overall, Offline RL proved to be a powerful paradigm to tackle both a well known benchmark environment and an industry related case, with many open spaces for possible future improvements.
Relatori:	Francesco Vaccarino, Luca Sorrentino
Anno accademico:	2021/22
Tipo di pubblicazione:	Elettronica
Numero di pagine:	101
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Data Science And Engineering
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	ADDFOR S.p.A
URI:	http://webthesis.biblio.polito.it/id/eprint/22743

Modifica (riservato agli operatori)