TY - JOUR
T1 - A Unified Engine for Accelerating GNN Weighting/Aggregation Operations, With Efficient Load Balancing and Graph-Specific Caching
AU - Mondal, Sudipta
AU - Manasi, Susmita Dey
AU - Kunal, Kishor
AU - Ramprasath, S.
AU - Zeng, Ziqing
AU - Sapatnekar, Sachin S.
N1 - Publisher Copyright:
© 1982-2012 IEEE.
PY - 2023/12/1
Y1 - 2023/12/1
N2 - Graph neural networks (GNNs) analysis engines are vital for real-world problems that use large graph models. Challenges for a GNN hardware platform include the ability to 1) host a variety of GNNs; 2) handle high sparsity in input vertex feature vectors and the graph adjacency matrix and the accompanying random memory access patterns; and 3) maintain load-balanced computation in the face of uneven workloads, induced by high sparsity and power-law vertex degree distributions. This article proposes GNNIE, an accelerator designed to run a broad range of GNNs. It tackles workload imbalance by 1) splitting vertex feature operands into blocks; 2) reordering and redistributing computations; and 3) using a novel flexible MAC architecture. It adopts a graph-specific, degree-aware caching policy that is well suited to real-world graph characteristics. The policy enhances on-chip data reuse and avoids random memory access to DRAM. GNNIE achieves average speedups of 7197× over a CPU and 17.81× over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool. Compared to prior approaches, GNNIE achieves an average speedup of 5× over HyGCN (which cannot implement GATs) for GCN, GraphSAGE, and GINConv. GNNIE achieves an average speedup of 1.3× over AWB-GCN (which runs only GCNs), despite using 3.4× fewer processing units.
AB - Graph neural networks (GNNs) analysis engines are vital for real-world problems that use large graph models. Challenges for a GNN hardware platform include the ability to 1) host a variety of GNNs; 2) handle high sparsity in input vertex feature vectors and the graph adjacency matrix and the accompanying random memory access patterns; and 3) maintain load-balanced computation in the face of uneven workloads, induced by high sparsity and power-law vertex degree distributions. This article proposes GNNIE, an accelerator designed to run a broad range of GNNs. It tackles workload imbalance by 1) splitting vertex feature operands into blocks; 2) reordering and redistributing computations; and 3) using a novel flexible MAC architecture. It adopts a graph-specific, degree-aware caching policy that is well suited to real-world graph characteristics. The policy enhances on-chip data reuse and avoids random memory access to DRAM. GNNIE achieves average speedups of 7197× over a CPU and 17.81× over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool. Compared to prior approaches, GNNIE achieves an average speedup of 5× over HyGCN (which cannot implement GATs) for GCN, GraphSAGE, and GINConv. GNNIE achieves an average speedup of 1.3× over AWB-GCN (which runs only GCNs), despite using 3.4× fewer processing units.
KW - Graph neural network (GNN)
KW - graph-specific caching
KW - hardware accelerator
KW - load balancing
UR - http://www.scopus.com/inward/record.url?scp=85146231073&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146231073&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2022.3232467
DO - 10.1109/TCAD.2022.3232467
M3 - Article
AN - SCOPUS:85146231073
SN - 0278-0070
VL - 42
SP - 4844
EP - 4857
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 12
ER -