polito.it
Politecnico di Torino (logo)

An exploration on connectivity and efficiency in Coarse-Grain Reconfigurable Architectures

Mattia Cozzolino

An exploration on connectivity and efficiency in Coarse-Grain Reconfigurable Architectures.

Rel. Maurizio Martina, Luigi Giuffrida. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2025

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview
Abstract:

In recent years, hardware architectures have evolved significantly, drawing increasing attention toward reconfigurable and parallelized models — systems capable of balancing flexibility and performance. The growing demand for such solutions has led Coarse-Grain Reconfigurable Architectures (CGRA) to become one of the most widely adopted approaches, as they can adapt to different computational problems and are particularly suitable for numerical and data-parallel applications. This thesis focuses on the design of a parametric CGRA composed of an NxN matrix of Processing Elements (PEs), where both the size and the interconnectivity between PEs can be modified. A Direct Memory Access (DMA) communication system has been integrated to generate AXI-Stream data flows, ensuring efficient frame loading and continuous data transfer toward the matrix. An important part of the work was dedicated to the study of interconnections among the processing elements. The analysis mainly focused on comparing, in terms of area, timing, power consumption, and parallelism potential, four different interconnection topologies: MESH4, D-MESH, D-TORUS, and FULL. These configurations were implemented at the RTL level to observe how increasing the internal connectivity affects overall performance and design complexity. All modifications were made fully compatible with the existing testbench and are synthesizable through Synopsys Design Compiler, with the possibility of a direct transition to Place & Route tools. The synthesis results show that moving from the FULL topology to MESH4 leads to an area reduction of about 40% and an improvement in timing of around 10%, while the D-TORUS topology proves to be the best compromise in terms of area, timing, and effective parallelism. The D-MESH configuration stands between MESH4 and D-TORUS, without offering significant advantages in any particular metric. From a functional standpoint, Verilator simulations confirm the correct behavior of the system and its ability to execute multiple operations in parallel while maintaining a constant throughput. Overall, the analysis highlights that as the number of PEs increases, it is preferable to adopt regular, limited-degree topologies, which ensure scalability and ease of integration within complex systems. This approach paves the way for the design of efficient reconfigurable accelerators, with potential applications in DSP, machine learning, and high-performance embedded systems.

Relatori: Maurizio Martina, Luigi Giuffrida
Anno accademico: 2025/26
Tipo di pubblicazione: Elettronica
Numero di pagine: 88
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-29 - INGEGNERIA ELETTRONICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/38727
Modifica (riservato agli operatori) Modifica (riservato agli operatori)