Efficent Transformer attentions in time series forecasting

Andrea Arcidiacono

Efficent Transformer attentions in time series forecasting.

Rel. Francesco Vaccarino, Rosalia Tatano. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (6MB) | Preview

Abstract

Transformer-based architectures are neural networks architectures developed for natural language processing. These state-of-the-art architectures innovation is the use of the self-attention mechanism. These models have been deployed in several settings, not just limited to natural language, but also including videos and images. However they are hard to scale up for industrial applications due to the quadratic time and memory complexity of attention mechanism. Therefore, there has been a extensive research in proposing new variants of these architectures to solve this problem approximating the quadratic cost attention matrix, making the model more efficient and more lightweight. This thesis is focused on analyzing the recently proposed efficient attention mechanisms of Performer, BigBird and Informer and apply them to the task of time series forecasting.

In particular, starting from the implementation of the Informer, the attention mechanisms of Performer and BigBird are integrated in its architecture, resulting in four models to be tested: Informer with vanilla attention mechanism, Informer with the so-called ProbSparse attention, Informer+Performer, Informer+BigBird