Politecnico di Torino (logo)

Structured Pruning of Vision Transformers at Training Time

Leonardo Tredese

Structured Pruning of Vision Transformers at Training Time.

Rel. Daniele Jahier Pagliari, Alessio Burrello, Matteo Risso, Beatrice Alessandra Motetti. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview

Attention-based transformers have emerged as a powerful paradigm achiev- ing state-of-the-art results on tasks such as natural language processing and computer vision. However, transformers typically present higher computa- tional costs and parameter count compared to convolutional networks. This inefficiency impedes deploying transformers to resource constrained devices such as edge devices. Structured pruning techniques present a promising direction to compress transformers for the edge computing scenario. This thesis investigates pruning techniques to induce structured sparsity in vision transformers, thereby reducing computational requirements while minimizing accuracy degradation. The goal is developing methodologies for efficient vision transformer inference. Structured pruning learns importance scores for individual network com- ponents at training time by solving an optimization problem that tries to maximize task performance while minimizing the number of parameters in the model. The importance scores are then transformed into binary masks that prune unimportant structures such as specific linear layers output di- mensions or entire attention heads. To promote regularity in the induced sparsity patterns, various mask shar- ing strategies are proposed that couple pruning decisions across related ar- chitectural elements. Regularity is crucial because complete independence precludes the removal of some masked components due to the specific con- nectivity pattern of transformers, leading to lower compression rates when the model is actually deployed on hardware. Empirical results demonstrate that in image classification tasks completely independent masking of components outperforms sharing strategies in terms of balancing accuracy and sparsity. None the less, experiments show that through a mix of shared and independent masks, the proposed pruning scheme successfully compresses vision transformers as much as 90% with an accuracy drop of just 4% or 70% compression rate with less than 1% in accuracy drop.

Relators: Daniele Jahier Pagliari, Alessio Burrello, Matteo Risso, Beatrice Alessandra Motetti
Academic year: 2023/24
Publication type: Electronic
Number of Pages: 56
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/29328
Modify record (reserved for operators) Modify record (reserved for operators)