Politecnico di Torino (logo)

Study and analysis of training strategies to improve the reliability of artificial neural networks.

Gabriel Alejandro Ceron Viveros

Study and analysis of training strategies to improve the reliability of artificial neural networks.

Rel. Edgar Ernesto Sanchez Sanchez, Annachiara Ruospo. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2021

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview

In the latest years we have seen an increased us of machine learning applications due to the increasing computational power and the development of more advanced techniques to train and implement these algorithms. Machine learning applications can be trained with real world data to perform a task without explicit programming. One of the most popular and widely used machine learning algorithm is the artificial neural network (ANN) and specially the deep neural networks (DNNs) which have shown to perform even above human precision. The great performance of DNNs have found successful applications in various areas such as avionics, automotive and medical devices. Some of these areas are considered safety-critical because system failures can compromise human lives. For this reason, in the last decades there has been an increasing interest by the research community to understand and improve the reliability of these computing models. In this work we present a study and analysis of different methods to train DNNs to improve their reliability. Specifically, we train a residual network of 18 layers (ResNet18) with different training parameters and optimizers to see how much the accuracy of these trained models decreases in the presence of faults. We perform fault injections at a software level using the PytorchFI library which works over the framework Pytorch. The fault model implemented is a weight fault that at the moment of being accessed it reads the value of zero instead of the real weight value. The faults are injected progressively for a total of 0.1% and 1% of the total trainable parameters of the ResNet18 (11 million). The results show that there are subtle relations between the reliability and the tested training parameters such as Batch Size and Weight Decay, these parameters show different behavior on each optimizer so specific values for the optimizers are recommended. However, the most important result in this work is obtained at the moment of comparing reliability between optimizers. After selecting the models which gave the best reliability from the fault injection campaigns, we observe from the experiments that the optimizer which performs better in terms of reliability is SGD, followed by Adagrad, then Rmsprop and finally Adam. Another experiment was done with the SGD optimizer, we used the fault model bit-flip to see if there is a bit of the weight (float-64) which is more sensible to this kind of fault, to do so we injected 1k and 10k faults on random weights but applying the bit-flip in the same bit each time. The results showed that bits 54 and 62 are very sensible to this fault and, as these bits determine the exponent of the floating-point number, they drop the accuracy significantly.

Relators: Edgar Ernesto Sanchez Sanchez, Annachiara Ruospo
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 100
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/21314
Modify record (reserved for operators) Modify record (reserved for operators)