Politecnico di Torino (logo)

Hardware Acceleration of 5G LDPC using datacenter-class FPGAs

Luca Romani

Hardware Acceleration of 5G LDPC using datacenter-class FPGAs.

Rel. Luciano Lavagno, Salvatore Scarpina. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2020

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview

Low density parity check (LDPC) codes are error correcting codes discovered in 1963 by Robert Gallager and they have been forgotten for many years because of the computational requirements needed to achieve the theoretical performance. LDPC codes can work with different block lengths and high rates and they can provide a good bit error rate in noisy channels and an high throughput. The aim of this work is to accelerate the LDPC decoder because it is a performance critical module for 5G due to its iterative algorithm. The flexibility of FPGAs is exploited to accelerate the decoder using a software solution provided by the OpenairInterface Software Alliance consortium (OAI). The code by OAI is explored and optimized inside the SDAccel development environment by Xilinx, then the corresponding bitstream is generated and uploaded on a Xilinx FPGA. The optimization work has started from a C code for Intel processors supporting the AVX2 library. Later the AVX2 solution is discarded due to the code structure which is not synthesizable and accepted by the Vivado HLS tool. As an alternative option, a CUDA code for GPUs has been chosen to be deployed on FPGA. Firstly it is imported in OpenCL language and then optimized using high level synthesis techniques. In particular, array reshaping, loop pipelining, loop unrolling and loop fusion are implemented. Global memory accesses are fully optimized by means of reading and writing burst operations. In addition, widening of memory ports is used. In order to improve the data transfer during computation, the on-chip memory is exploited instead of the off-chip one, which would have increased the latency of the application. Four different performance results are obtained: the first one is related to the AVX2 code for Intel CPUs. The second one is the software emulation of the OpenCL code. The third one is the CUDA code and the latter one is the performance of the FPGA. The comparison among the them shows that the GPU code scales by a factor of 2.4 the AVX2 implementation, whilst the FPGA is twice faster than the software solution. The FPGA acceleration is worse than the GPU and AVX2 ones. The main reason of the FPGA poor performance is caused by the code structure that has been discovered being a worst case scenario for FPGAs. An optimum code to be run on an FPGA must have inner loops with a fixed loop bound whilst the outermost one can be variable. If the inner loop have a static number of iterations then it can be unrolled and memories can be proportionally partitioned to get the maximum parallelism. The CUDA code instead has the opposite scenario. Vivado HLS is not able to partition memory objects that are accessed with a non-constant index inside loops with variable loop bound. The logic that is introduced to use the memory banks almost triplicates the latency of the application. Moreover the size of the on-chip memory is not suitable to partition them completely. Finally loop unrolling is used without array partitioning, work items pipelining is applied to reduce the latency due to the work items loop. The final results show that the FPGA decoder is still slower than the GPU solution with a factor of 382x. Given the source code structure, the OpenCL decoder cannot achieve the performance of the GPU one since the combination of unrolling and array partitioning cannot be exploited.

Relators: Luciano Lavagno, Salvatore Scarpina
Academic year: 2020/21
Publication type: Electronic
Number of Pages: 109
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: New organization > Master science > LM-29 - ELECTRONIC ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/16046
Modify record (reserved for operators) Modify record (reserved for operators)