A Comprehensive Overview of Fault Tolerance Techniques for Convolutional Deep Neural Networks

Vincenzo De Marco

A Comprehensive Overview of Fault Tolerance Techniques for Convolutional Deep Neural Networks.

Rel. Lia Morra. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial Share Alike.
Download (1MB) | Preview

Abstract:	Deep Neural Networks (DNNs) are being increasingly used in safety-critical applications, from healthcare to autonomous driving. Furthermore, as new frontiers unfold, such as the growing New Space Economy, DNNs are expected to be widely deployed in outer space and other hazardous environments soon. However, their prediction accuracy was shown to degrade in presence of transient hardware faults, leading to unpredictable and potentially catastrophic errors. As these kinds of problems are frequent in radiation-prone domains such as space, it is of the utmost importance to strenghten the DNNs' resistance to computational or parameters errors. In the relevant literature, multiple fault tolerance techniques have been researched, which limit the consequences of potential faults. Nonetheless, most techniques available today mainly rely on hardware redundancy, which can be unsustainable for mass DNNs deployment in out-of-reach scenarios. Consequently, in recent years, several techniques have been proposed to increase the fault tolerance of DNNs by modifying the network structure and/or training procedure, thereby reducing the need for costly additional hardware. These methods, however, have not been extensively tested, if not on few benchmark datasets and models, which often are not representative of real-world applications. Hence, this study aims to categorize existing techniques and further enhance the most suitable for its objectives, an activation clipping strategy called ClipAct. The goal is to propose and evaluate a solid baseline to increase DNNs' fault tolerance, without introducing hardware or runtime overhead. This method is designed to be easily adaptable to a broad variety of use cases and implementations, and able to be complemented with more tailored solutions, depending on the desired application and the available resources. Multiple experiments carried out, utilizing various widely-recognized models and several datasets, including four classification tasks related to Earth Observation. Experimental results emphasize the effectiveness of the investigated fault tolerance techniques. However, experimental results vary across datasets and DNN architectures.
Relators:	Lia Morra
Academic year:	2023/24
Publication type:	Electronic
Number of Pages:	98
Subjects:
Corso di laurea:	Corso di laurea magistrale in Data Science And Engineering
Classe di laurea:	New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici:	UNSPECIFIED
URI:	http://webthesis.biblio.polito.it/id/eprint/28512

Modify record (reserved for operators)