Capabilities and Application of Deep Learning Recurrent Models

Destiny Jarymaya Okpekpe

Capabilities and Application of Deep Learning Recurrent Models.

Rel. Lia Morra. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (4MB) | Preview

Abstract

Even with the major advances in Language Modelling in recent years after the introduction of transformer architecture, reasoning is still one of the unique skills of the human brain that Deep Learning models struggle to replicate the most. Since one of the main challenges is to efficiently recall information seen in the past, the Associative Recall (AR) synthetic task has gained importance for being a good proxy for language modelling and a suitable benchmark to select promising Large language models. A series of recurrent-gated models (such as H3, Mamba and Hyena), built to overcome the drawbacks of the O(L^2) computational complexity of the attention module, recently gained popularity for solving AR even with long sequences (more than 10,000 tokens).

However, when scaled and trained on real language tasks, those models still can't achieve the performance of transformers