Integrating Neural Processing Unit and Attention-based Architecture for Efficient Real-time Face Recognition in Industrial Environments

Davide Aiello

Integrating Neural Processing Unit and Attention-based Architecture for Efficient Real-time Face Recognition in Industrial Environments.

Rel. Luciano Lavagno, Ilario Gerlero, Marcello Babbi. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

Abstract:	In recent years, the rise of Transformer models and attention mechanisms has significantly revolutionized the field of machine learning, particularly in computer vision and natural language processing. While attention-based architectures enable models to effectively prioritize and understand complex relationships within the data, deploying Transformer models on microcontrollers remains a challenge. Simultaneously, Neural Processing Units (NPUs) have become key hardware accelerators, designed specifically to optimize deep learning tasks. Their efficiency and low power consumption make them ideal for real-time applications in constrained environments. This study focuses on developing a complex deep learning pipeline that leverages attention-based models for real-time face recognition, targeting a cutting-edge microcontroller equipped with an NPU. The work covers the complete development process of a deep learning system on edge, starting from the selection and potential design modification and training of the neural networks, followed by quantization, deployment and ultimately execution on the target hardware. In pursuit of this goal, this work highlights the pivotal role of EdgeAI in driving the next generation of smart devices, emphasizing its potential along with its foundational principles. The system’s pipeline is examined in detail across all its stages, with a particular focus on lightweight neural network models that are tailored to the constraints of microcontroller environments. An exhaustive review of the models suitable for each task is provided, along with the relevant datasets, post-processing techniques and validation methods for the selected models. An innovative aspect of this research is the successful adaptation of transformer-based architectures on microcontrollers leveraging a novel attention mechanism called Convolutional Self-Attention. Transformer models are comprehensively analyzed, focusing on their strengths, weaknesses and their integration with convolutional neural networks in the emerging trend toward hybrid architectures. Additionally, the typical constraints that prevent their deployment on microcontrollers are examined and how these challenges can be overcome to achieve successful execution is demonstrated. The result is a fully autonomous and independent system capable of processing the entire pipeline within a matter of milliseconds integrating the most advanced family of neural network architectures, pushing the boundaries of what is achievable in embedded systems.
Relatori:	Luciano Lavagno, Ilario Gerlero, Marcello Babbi
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	98
Informazioni aggiuntive:	Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici:	SENSOR REPLY S.R.L. CON UNICO SOCIO
URI:	http://webthesis.biblio.polito.it/id/eprint/33094

Modifica (riservato agli operatori)