Politecnico di Torino (logo)

Fault Injection techniques for GPU Reliability Evaluation

Luigi Galasso

Fault Injection techniques for GPU Reliability Evaluation.

Rel. Matteo Sonza Reorda, Juan David Guerrero Balaguera. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2022

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview

A Graphical Processing Unit (GPU) is a computer chip that renders graphics and images by performing rapid mathematical calculations. In recent years, however, GPUs are exploited for reasons beyond graphics processing as General-Purpose GPU (GPGPU); they work as hardware accelerators for high-performance computing in many different fields, including safety-critical applications. In these domains Convolutional Neural Network (CNN) represent a widely used computing approach which is well supported by GPU since they leverage data and thread-level parallelism. Considering this information, the reliability evaluation of GPUs is needed to meet desired requirements. To achieve this objective, it is necessary to study the GPU behavior in presence of hardware faults. In this thesis project in particular, the presence of permanent faults affecting GPU functionalities have been analyzed. A permanent fault persists indefinitely (or at least until repair) after its occurrence: it manifests as stuck-at bits in the architecture that is, lines that always carry the logical signal “0” or “1”. Those faults can be mimed by injecting via software errors in the code running on the GPU; this could be obtained masking at assembly level one or more bit of a selected register after the corresponding instruction is executed. Therefore, in this work, it has been developed a framework, based on a binary instrumentation tool (NVBitFI), realized to properly perform permanent fault injection campaigns. Some injection techniques were elaborated to target distinct elements inside a GPU: the register files and the functional units. The presented environment has been used to test NVIDIA GPUs with a specific CNN target application. To support the model used many fault simulations were performed and the obtained results were analyzed and compared.

Relators: Matteo Sonza Reorda, Juan David Guerrero Balaguera
Academic year: 2021/22
Publication type: Electronic
Number of Pages: 60
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: New organization > Master science > LM-29 - ELECTRONIC ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/22769
Modify record (reserved for operators) Modify record (reserved for operators)