Sergio Mazzola
ISA extensions in the Snitch Processor for Signal Processing.
Rel. Alberto Macii, Luca Benini, Samuel Riedel, Matheus De Araujo Cavalcante. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2021
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (1MB) | Preview |
Abstract: |
The last decades have seen a growing interest for data processing in power-constrained environments with strict timing requirements. The leading examples of such a trend are mobile devices, particularly striving for high performance in media applications. Smartphones cameras feature image sensors with tens of millions of pixels, imposing huge image and video processing loads to be handled with tight power budgets, often in real-time. However, the nature of such loads and their inherent parallelism can be exploited to meet power and timing constraints. To this end, specialized platforms called image signal processors (ISPs) have increasingly gained attention, with their highly parallel architectures and domain-specific instructions. Several architectures for L1-shared clusters have been developed. However, they either do not scale beyond tens of cores or solve memory sharing with a deeper memory hierarchy, leading to a significant degradation in access latency. Specialized architectures that achieve both are dedicated to a specific family of algorithms due to their over-restrictive interconnect. MemPool is a 32-bit many-core system that scales up to 256 cores sharing a large pool of scratchpad memory (SPM) through a low-latency, hierarchical interconnect. Despite its general-purpose architecture and the high core count, MemPool reaches very competitive performance and efficiency with respect to the state of the art. Its smallest unit of repetition is the MemPool core complex (CC), featuring a RV32IMA Snitch core. Snitch is a tiny in-order, single-issue core based on the RISC-V open instruction set architecture (ISA). It is paired with an application-tunable accelerator, whose pipeline is fully decoupled from the core, making the MemPool CC a pseudo-dual-issue system. In this work we present Xpulpimg, an extension of the Snitch instruction set including domain-specific instructions for digital signal processing (DSP). DSP is particularly useful for image processing, and can thus exploit the full potentialities of the MemPool system as an ISP. The introduced DSP instructions have been carefully selected from the Xpulp custom RISC-V extension for DSP, based on their impact on software of interest. MemPool CC post-synthesis figures have been strictly taken into consideration for the micro-architectural design exploration. In particular, the Xpulpimg extension introduces in Snitch new addressing modes for load and store instructions, single-instruction-multiple-data (SIMD) operations for 16-bit and 8-bit sub-words and additional utilities for DSP, such as multiply-accumulate, clips, comparisons. Due to the open and modular nature of the standard RISC-V ISA, a main concern of our project has been to keep the whole environment as modular and extensible as possible, also granting full support for the new extension. In doing so, we propose a framework for opcode space management, ISA modeling and simulation, verification and compilation support. To evaluate the proposed extension we benchmarked the MemPool cluster in several configurations with DSP algorithms optimized for Xpulpimg, measuring a speed-up of up to 4.4× with respect to the initial design. We synthesized the MemPool CC for the GF22FDX technology, aiming to obtain a Pareto-optimal design in terms of area and frequency. In typical conditions, we measured a maximum operating frequency degradation of 3.6%. At the target frequency of 500 MHz, the extended co-processor accounts for 23 kGE. |
---|---|
Relators: | Alberto Macii, Luca Benini, Samuel Riedel, Matheus De Araujo Cavalcante |
Academic year: | 2020/21 |
Publication type: | Electronic |
Number of Pages: | 98 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Ente in cotutela: | ETH Zurich (SVIZZERA) |
Aziende collaboratrici: | ETH Zurich |
URI: | http://webthesis.biblio.polito.it/id/eprint/18144 |
Modify record (reserved for operators) |