polito.it
Politecnico di Torino (logo)

Reliability assessment and software-based hardening of a hyperspectral image classifier for GPUs

Sergiu-Mohamed Abed

Reliability assessment and software-based hardening of a hyperspectral image classifier for GPUs.

Rel. Josie Esteban Rodriguez Condia, Matteo Sonza Reorda, Juan David Guerrero Balaguera. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (14MB) | Preview
Abstract:

Over the last few years, artificial intelligence (AI) has been adopted across many domains and sectors. One such domain is Edge Computing, where Internet of Things (IoT) devices are now being designed to handle neural networks to provide computation at the source of data to reduce latency and throughput on the network. In doing so, artificial intelligence now faces a new challenge, which is frequently encountered in embedded systems deployed in uncontrolled and rough environments: reliability. Many methods have been studied to assess the reliability of neural network models. However, more research is still needed to understand the effects of faults at the hardware level on the performance of neural networks. This thesis focuses on the impact of transient faults at the device hardware on the performance of a Hyperspectral Image Classifier. To do so, a tool developed by NVIDIA called NVBitFI has been used to simulate transient faults by injecting faults at the instruction set level of the classifier. After an exhaustive amount of simulations performed under different configurations of the classifier, it was possible to quantify the following: the number of outcomes that led to changes in the output without affecting the classification (15.89%), the number of outcomes that led to changes in the output and altered the classification (24.89%) and crashes (or hangs) of the model (11.47%). Also, it was possible to identify the most sensitive parts of the classifier, i.e., the parts that, when subjected to faults, contributed to most of the changes in the model's behavior. Lastly, a software-level hardening technique has been applied to the critical parts of the classifier to mitigate the effects of transient faults, hence increasing its fault tolerance. In future work, further analysis similar to this work should be performed on different kinds of applications making use of the same libraries used by the hyperspectral image classifier studied here (i.e., cuBLAS and PyTorch) to understand if similar trends can be noticed across different scenarios, and so, potentially improve the fault tolerance of the susceptible functions.

Relatori: Josie Esteban Rodriguez Condia, Matteo Sonza Reorda, Juan David Guerrero Balaguera
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 70
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/33880
Modifica (riservato agli operatori) Modifica (riservato agli operatori)