Politecnico di Torino (logo)

Deep Reinforcement Learning for Portfolio Optimization

Gioele Scaletta

Deep Reinforcement Learning for Portfolio Optimization.

Rel. Luca Cagliero, Jacopo Fior. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview

The imperative task of managing assets amid uncertainty by exploiting market inefficiencies carries significant implications for financial investors. Portfolio management, involving asset selection, allocation, and monitoring, aims to maximize returns while mitigating risks, considering the specific financial goals, risk appetite, and investing time horizon preferences of each individual or institution. Indeed, the risk-return trade-off, central to portfolio optimization, hinges on the investor's preferences. Risk encompasses systematic and unsystematic risks, with diversification mitigating the latter. In imperfect markets, characterized by information asymmetries and frictions, investors hope to exploit inefficiencies also created by their own psychology sometimes leading to irrational behaviour. In this environment, creating a customized and dynamically informed portfolio management strategy based on mathematical predictions of future asset prices often becomes unfeasible. This thesis work focuses on automating the portfolio management task using Deep Reinforcement Learning (DRL). DRL algorithms are based on an agent’s interaction with the environment, optimizing decisions based on feedback. The lack of analytical techniques for portfolio optimization together with DRL’s specific features led to the choice of this technique. DRL suitability lies in its ability to directly output investment actions without having the unrealistic presumption of predicting future prices thus partly overcoming the complexity associated with modelling market underlying functioning. The additional contributions of this project lie in addressing the limitations of applying DRL to portfolio optimization enhancing the model with sequence models architectures and imitation learning. GTrXL is a transformer-based sequence modelling architecture that was used with the aim of improving the embedding of past prices information in the DRL algorithm’s state. Imitation learning consists in training a RL algorithm with the information collected by observing an expert performing the task. Indeed, by predicting in hindsight the optimal actions, an expert actions dataset has been created, and this allowed to pre-train the DRL algorithm with the expert actions using the imitation learning technique. Moreover, the expert actions have also been used in a third variant by adapting the TD3-BC model for the Portfolio Allocation use case by enlarging the training set with a wider set of stocks than the ones used for trading. The goal of those two last contributions is to improve data efficiency since financial markets data are limited and DRL is effective only when a lot of data are available, in fact historically, its main application is the game-playing domain where data can be produced endlessly.

Relators: Luca Cagliero, Jacopo Fior
Academic year: 2023/24
Publication type: Electronic
Number of Pages: 108
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/31091
Modify record (reserved for operators) Modify record (reserved for operators)