polito.it
Politecnico di Torino (logo)

Reliability issues in GPGPUs

Chiara Penaglia

Reliability issues in GPGPUs.

Rel. Matteo Sonza Reorda, Luca Sterpone, Josie Esteban Rodriguez Condia, Boyang Du. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2019

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview
Abstract:

The present work discusses approaches for implementing software redundancy schemes using the open source GPGPU model FlexGrip to increase the reliability of a GPGPU. ?? ??Most GPUs do not feature hardware support for error detection, and a device such as GPGPU a corrupt result could be unacceptable, as applications such as machine vision rely on the correctness of the processed image. A fault could occur at any time during the operation of the device, and it's critical that it is either masked or detected. Therefore improving the fidelity of GPGPU using software redundancy seems to be the only way to avoid errors. ?? ??In this work of thesis several approaches for matrix multiplication were produced, recording the performance of each; The three approaches differ in the method by which they guarantee the correct result. The first case is double comparison (DWC) which implies repeatedly performing operations and comparing the results, in case they are equal the correct result is stored in memory. ??The second method is the TMR. It is based on the triplication of resources and a voter who establishes by a majority which element is the correct one. The last method studied is ABFT which through comparisons identifies in which cell the error occurred and corrects it. ?? ??Each code was tested on the FlexGrip model after the injection of static faults inside the register file of each streaming multiprocessor. The expected result of each program obtained in simulation - the "golden output" - was compared to the same result in presence of injected static faults. ?? ??Results were finally collected and the fault coverage analysed, along with the time required and memory space. Future tests may be performed with different fault models, such as transient or delay faults, since the behaviour of the circuit would vary unpredictably.

Relatori: Matteo Sonza Reorda, Luca Sterpone, Josie Esteban Rodriguez Condia, Boyang Du
Anno accademico: 2019/20
Tipo di pubblicazione: Elettronica
Numero di pagine: 53
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-29 - INGEGNERIA ELETTRONICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/13230
Modifica (riservato agli operatori) Modifica (riservato agli operatori)