The Multiply-And-Max/min Neural Paradigm as a Pruning and Training Accelerator

Lorenzo Nikiforos

The Multiply-And-Max/min Neural Paradigm as a Pruning and Training Accelerator.

Rel. Fabio Pareschi, Luciano Prono, Gianluca Setti. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (2MB) | Preview

Abstract:	Neural networks have revolutionized the field of artificial intelligence, enabling machines to perform complex tasks once exclusive to human cognition. However, large-scale neural networks present significant computational challenges, particularly during training on servers and deployment on embedded devices. The high computational cost and resource demands impede their practical application in low-resources/energy devices. To address this issue, pruning is introduced, which is a technique that systematically removes redundant parameters and has emerged as a promising solution to reduce computational complexity while maintaining performance. This master's thesis explores the effectiveness of a novel layer, Multiply-And-Max/min (MAM), introduced as an alternative to the classical Multiply and Accumulate (MAC) approach, wherein the reduction function is not the sum of all elements but only of the largest and the smallest. Experimental results demonstrate the efficacy of the MAM-based approach in significantly sparsifying matrices through different pruning techniques, particularly the Global Gradient Pruning (GGP), which achieved, e.g. on ViT trained on ImageNet-1K, an accuracy drop less than 3% while removing 99.93% of weights. In particular, this study highlights novel properties of the MAM neurons. Since the MAM layer's ability is to identify essential interconnection, it is possible to reintroduce the MAC layer post-pruning, thereby reducing numbers of FLOPs from 3 to 2 for each weight. The validity of this transition is supported by empirical evidence showing that the strength of this layer lies in identifying crucial interconnections. As an example, as observed with ViT trained on CIFAR-100 dataset, moving from a deeply pruned MAM structure to a deeply pruned MAC structure keeps the accuracy unaltered, that goes from 79.70% for MAM to 78.95%. A final experiment involves pruning the DNN layers before the convergence of the training process, demonstrating two important properties of the MAM neural paradigm. Firstly, MAM is capable of identifying the crucial interconnections prior to convergence. Secondly, by leveraging this, it is possible to introduce significant FLOPs savings during the training on server, reducing the energy consumption. As an example, in the case of AlexNet trained on CIFAR-10, 99.8% of FLOPs can be theoretically saved for the pruned layers during training with MAM.
Relatori:	Fabio Pareschi, Luciano Prono, Gianluca Setti
Anno accademico:	2023/24
Tipo di pubblicazione:	Elettronica
Numero di pagine:	68
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	Politecnico di Torino
URI:	http://webthesis.biblio.polito.it/id/eprint/31860

Modifica (riservato agli operatori)