Politecnico di Torino (logo)

A Comparative Analysis of Methods and Tools for Identification of GPU-friendly Algorithms

Jacopo Pati

A Comparative Analysis of Methods and Tools for Identification of GPU-friendly Algorithms.

Rel. Alessandro Savino, Giulio Gambardella. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (6MB) | Preview

In solving advanced computational problems, GP-GPUs (General-Purpose Graphics Processing Units) have gained prominence in recent years. The adoption of GPUs not only for computer graphics allows to exploit their huge compute performance and high level of parallelism to lighten the CPU from burdensome executions. Aim of this thesis is to analyze automatic methodologies to help developers to find code for acceleration, studying ways to identify functions amenable to GPU acceleration without prior knowledge or assumption on the code-base. The first parameter useful in identifying such loops is Arithmetic Intensity (AI). The higher is the value, the most likely the code will benefit from GPU offloading. Although AI is independent from the hardware characteristics, it can be related to the FLOPs/s of the machine through an analysis called Roofline, in order to also identify if the considered functions are memory or compute bound. We analyzed different tools, namely Intel VTune together with SDE, RRZE LIKWID and Intel Advisor, allowing to calculate AI with little effort on the developer side. A set of publicly available benchmarks (KernelGen Test Suite) has been used to evaluate and compare the tools in depth, relying on implementations from third party of the code base to GPU (e.g., CUDA) as a golden reference. We demonstrate how Intel Advisor provides the best evaluation, thanks to its ability to model GPU execution advantages. Given the limit on the AI evaluation as standalone metric for GPU offload potential, we improved our analysis including the modeling of the speed-up on GPU versus the CPU counterpart, providing further hints thanks to the Intel Advisor feature, using as end-to-end approach evaluation the open source Static Time Analysis (STA) tool OpenTimer. Finally, given the estimation provided by the benchmarked tools, we ported core function of OpenTimer to GPU with minimal code changes using OpenACC to validate the AI evaluation. Despite being an important first step in assessing GPU offload opportunities, we conclude that AI evaluation is not enough information to completely estimate possible advantage of GPU execution. The evaluation is based on the CPU implementation without techniques, like batching or vectorization, that could increase GPU advantages.

Relators: Alessandro Savino, Giulio Gambardella
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 121
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Ente in cotutela: Synopsys (IRLANDA)
Aziende collaboratrici: SYNOPSYS INTERNATIONAL LTD
URI: http://webthesis.biblio.polito.it/id/eprint/26759
Modify record (reserved for operators) Modify record (reserved for operators)