polito.it
Politecnico di Torino (logo)

Reinforcement Learning for Hybrid/electric vehicle: Analysis and performance of reward functions in a real-time algorithm for P2-HEV

Pasquale Ciccullo

Reinforcement Learning for Hybrid/electric vehicle: Analysis and performance of reward functions in a real-time algorithm for P2-HEV.

Rel. Ezio Spessa, Claudio Maino, Matteo Acquarone, Daniela Anna Misul. Politecnico di Torino, Corso di laurea magistrale in Automotive Engineering (Ingegneria Dell'Autoveicolo), 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview
Abstract:

Conventional vehicles with internal combustion engine (ICE) provide a good performance and long operating range by utilizing the high energy-density advantages of petroleum fuels. However, the conventional ICE-vehicles suffer the disadvantages of poor fuel economy and environmental air pollution. One of the immediate alternative solutions is the HEVs (Hybrid-Electric vehicle). In HEV, the introduction of one or more power sources increases the complexity of powertrain architecture and offers additional degrees of freedom in controlling the power-split between the power sources. In this energy management scenario, the Reinforcement learning (RL) allows to obtain a global optimization implemented in real-time differently from rule-based or optimization-based control strategies. In this work, a Deep Reinforcement Learning (DRL) algorithm, i.e., Double Deep-Q-network (DDQN), is adopted to control the power-split and the gear number. DDQN aims to simultaneously minimize the fuel consumption (FC) and maintain the state of charge (SOC) of the battery within the operating range. The case study is a parallel P2-HEV, passenger car. The software used is composed by three elements: Simulator, Agent and Environment Interface. The Simulator, is implemented in MATLAB, represents the model of the vehicle, and communicates with the Agent and Environment Interface, implemented in Python. The Environment Interface is the interface between the communication components of the Agent and physical simulation. The Agent has a logical interface divided into three elements: Training algorithm, Approximator (Artificial neural network) and Exploration strategy that is the component of the agent that allows to obtain optimal action-value function and find the optimal policy. The main goal of this work is to compare five different reward functions in order to demonstrate how this crucial function affects the performance of the algorithm. The DDQN algorithm and reward functions are analysed on four real driving cycles (CLUST), covering many possible vehicle driving scenarios. The results demonstrate how a good calibration on the reward coefficient allows to improve the effectiveness of the reward function, achieving a better performance and minimizing the fuel consumption (FC). The results are also compared to the Equivalent Consumption Minimization Strategy (ECMS) used as a benchmark energy management strategy.

Relatori: Ezio Spessa, Claudio Maino, Matteo Acquarone, Daniela Anna Misul
Anno accademico: 2022/23
Tipo di pubblicazione: Elettronica
Numero di pagine: 71
Soggetti:
Corso di laurea: Corso di laurea magistrale in Automotive Engineering (Ingegneria Dell'Autoveicolo)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-33 - INGEGNERIA MECCANICA
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/27490
Modifica (riservato agli operatori) Modifica (riservato agli operatori)