polito.it
Politecnico di Torino (logo)

Lightweight Vector Extension for Efficient Neural Network Inference on RISC-V

Alessio Caviglia

Lightweight Vector Extension for Efficient Neural Network Inference on RISC-V.

Rel. Maurizio Martina, Guido Masera, Michele Caon, Emanuele Valpreda, Flavia Guella. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

Abstract:

In recent years, Neural Networks (NNs) have expanded across various domains, including agriculture, wearable devices, and smart cities. This widespread adoption has created a growing need to reduce data transfer bottlenecks, driving increasing efforts towards shifting computation from the cloud to edge devices, which are, however, constrained by power, area, and cost. Vector processors offer a promising solution to the challenges posed by edge computing by balancing performance and resource efficiency. In this work, the CV32E20, a RISC-V scalar core originally designed for embedded applications, is extended with a subset of the RISC-V "V" Vector Extension (RVV) to leverage the benefits of vector processing on data-intensive tasks. As the primary objective of this work is to improve performance while keeping limited complexity, resource sharing between the vector and the scalar pipeline is maximized. Consequently, vector operations are processed sequentially, avoiding the need for duplicating functional units. Moreover, since the architecture is con- ceived to speed up NN inference, the original execution unit is enhanced to support Single Instruction Multiple Data (SIMD) operations on 8-bit and 16-bit elements and Multiply-Accumulate (MAC) instructions. Performance is further improved by interleaving memory accesses during vector instructions to reduce idle cycles and improve throughput. Compared to the original scalar core, the extended architecture shows a 16% area overhead, with the most significant contribution coming from the Vector Register File interface, which controls vector instructions. The added support for single-cycle MAC instructions, together with SIMD, causes the critical path to increase by 31%. For performance evaluation, the core is integrated and tested within X-HEEP, a configurable RISC-V-based microcontroller. Significant cycle reductions are demonstrated for basic vector operations: a 40% decrease in cycles with vectors of 16 elements, each 32 bits wide, and a 86% decrease with vectors of 128 elements, each 8 bits wide. The improvement in terms of cycles is to be addressed mainly to the reduced number of memory and control instructions. Overall, additional energy savings are expected on complex operations, such as matrix multiplications, due to the lower number of accesses to the instruction memory.

Relators: Maurizio Martina, Guido Masera, Michele Caon, Emanuele Valpreda, Flavia Guella
Academic year: 2024/25
Publication type: Electronic
Number of Pages: 77
Additional Information: Tesi secretata. Fulltext non presente
Subjects:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: Politecnico di Torino
URI: http://webthesis.biblio.polito.it/id/eprint/32949
Modify record (reserved for operators) Modify record (reserved for operators)