Subgraph Isomorphism Acceleration on HBM-based Data Center FPGAs using High-Level Synthesis

Shahabuddin Danish

Subgraph Isomorphism Acceleration on HBM-based Data Center FPGAs using High-Level Synthesis.

Rel. Luciano Lavagno. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (5MB) | Preview

Abstract:	Subgraph isomorphism is a fundamental NP-hard problem in graph theory and is important for applications like social network analysis and bioinformatics, and it is a significant computational challenge when processing very large and nonuniform datasets. While field-programmable gate arrays (FPGAs) provide energy-efficient platforms for creating specialized hardware to accelerate graph workloads, the performance of graph accelerators is often constrained by the memory subsystem's bandwidth. Worst-Case Optimal Join (WCOJ) algorithms have the property of bounding the size of intermediate results compared to traditional exploration-based methods and this shifts the primary performance bottleneck to memory access. This thesis addresses the memory bottleneck for this problem by studying the feasibility of architecturally expanding an existing low-power and high-performance WCOJ based subgraph isomorphism accelerator originally designed for embedded FPGAs with a 128-bit memory interface for deployment on modern data center FPGA platforms equipped with High Bandwidth Memory (HBM). The motivation for this architectural redesign is justified by a preliminary benchmark study on memory subsystems of AMD Alveo™ platforms, specifically HBM2 on the Alveo™ U55C and DDR4 on the Alveo™ U250. The benchmark study showed that HBM provides an approximate 6.5x aggregate sequential bandwidth improvement (~382 GB/s) over traditional DDR4 (~59 GB/s), comparable read latency and significant reduction in write latency. These findings confirm that the original kernel's 128-bit interface would severely underutilize the target platform's capabilities and a wider datapath would be necessary for achieving maximum performance. The primary contribution of this work is the comprehensive architectural redesign of the kernel's complete datapath to natively support a 512-bit physical memory bus to allow full utilization of the target platform's memory subsystem. This redesign is performed by maintaining the original graph data structures and using 128-bit logical instructions for graph primitives (vertices and edges) while packing four such instructions into a single 512-bit memory word. This packing/unpacking methodology is developed and applied to the complete datapath of the accelerator, from host-side data preparation to on-chip processing modules, and it required significant modifications to the kernel's two major phases, preprocessing and the multiway join pipelines. The preprocessing stage is redesigned to unpack logical 128-bit data graph instructions from incoming 512-bit words before sorting and scattering them into the final hash table structures and the pipelined multiway join is redesigned to consume these wider data structures. The proposed implementation is a fully-pipelined 512-bit native WCOJ accelerator developed in C++ using Vitis™ High-Level Synthesis and optimizes data movement to utilize the full memory bandwidth of data center FPGA platforms by architecturally aligning the kernel’s datapath with the physical memory interface. The thesis also demonstrates a reproducible methodology for migrating and optimizing other memory-bound HLS designs for high-performance computing environments.
Relatori:	Luciano Lavagno
Anno accademico:	2025/26
Tipo di pubblicazione:	Elettronica
Numero di pagine:	59
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-29 - INGEGNERIA ELETTRONICA
Aziende collaboratrici:	Politecnico di Torino
URI:	http://webthesis.biblio.polito.it/id/eprint/37627

Modifica (riservato agli operatori)