Portfolio management and Deep learning: Reinforcement learning and Transformer applied to stock market data

Marco Gullotto

Portfolio management and Deep learning: Reinforcement learning and Transformer applied to stock market data.

Rel. Enrico Bibbona, Patrizia Semeraro. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2021

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (7MB) | Preview

Abstract:	This thesis, developed during an internship in Add-For S.p.A., is a research thesis in the Fintech field to design a new financial strategy in “Portfolio management and selection”. The task is to make investment decisions based on strategies that ensure maximum profit for each investment period. It is a complicated optimization problem that aims to find the best actions to select the most profitable assets over a period of time. This task is challenging due to the difficulties in representing asset price series as these are non-stationary and exhibit noise and fluctuations. Deep learning has been used to address this problem, but the final results are rather poor and not applicable to real problems. The goal of this thesis is to study the state of the art in the portfolio management and selection field, with particular emphasis on the use of Transformer architecture and reinforcement learning (RL), and the application of these models to real problems. Transformer architecture is relatively new and relies primarily on the attention mechanism, which is used in place of recurrent or convolutional layers to learn both long-term and short-term dependencies. The attention mechanism was introduced in the context of natural language Natural Language Processing, but it can easily be generalized to other types of data such as time series. From the standard transformer version, introduced by Vaswani et al. other variants were introduced to increase the performance in various settings. Although this architecture is very performant, it is very complex to tune and more problems arise when trying to apply it to RL algorithms. For this reason, Parisotto et al., proposed a new variant of the Transformer that dramatically increases the convergence speed and stability when applied to RL problems. This solution is widely accepted by the community and seems to perform well in different situations and problems. Other proposed variants look promising, but their outcomes have not been validated unlike the work done by Parisotto. From the literature, several RL algorithms can be used for this type of task. Due to the high stochasticity of the model's environment, only a model-free approach can be used. This means that we do not know the full dynamics and reward function of that Markov decision process. Hence, model-free RL ignores the model and cares less about the inner workings. It uses sampling and simulation to understand how to maximize the final reward. Starting from previous Fintech works, many solutions involve the use of the DDPG (deep deterministic policy gradient) proposed by Lillicrap et al. So, in the first attempt, the agent basically follows the DDPG structure for the continuous action space, using the Parisotto Transformer encoders whose structure is robust to long-term dependencies and partial observability. Once the training is completed, by examining different time-series of assets, the agent is able to predict the best asset or set of assets in which to invest. That said, DDPG is a quite old algorithm and more up-to-date solutions are available. For this reason, other RL algorithms are tested to verify if newer solutions could improve the baseline obtained with DDPG. The most important improvement is obtained with the Soft Actor-Critic proposed by Haarnoja et al.. The results outperform the current state of the art but the final method does not seem so stable to be used in a real case scenario.
Relatori:	Enrico Bibbona, Patrizia Semeraro
Anno accademico:	2021/22
Tipo di pubblicazione:	Elettronica
Numero di pagine:	126
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Data Science And Engineering
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	ADDFOR S.p.A
URI:	http://webthesis.biblio.polito.it/id/eprint/20569

Modifica (riservato agli operatori)