A Unified Engine for Accelerating GNN Weighting/Aggregation Operations, With Efficient Load Balancing and Graph-Specific Caching

Sudipta Mondal; Susmita Dey Manasi; Kishor Kunal; S. Ramprasath; Ziqing Zeng; Sachin S. Sapatnekar

doi:10.1109/TCAD.2022.3232467

A Unified Engine for Accelerating GNN Weighting/Aggregation Operations, With Efficient Load Balancing and Graph-Specific Caching

Sudipta Mondal, Susmita Dey Manasi, Kishor Kunal, S. Ramprasath, Ziqing Zeng, Sachin S. Sapatnekar

Electrical and Computer Engineering

Research output: Contribution to journal › Article › peer-review

Abstract

Graph neural networks (GNNs) analysis engines are vital for real-world problems that use large graph models. Challenges for a GNN hardware platform include the ability to 1) host a variety of GNNs; 2) handle high sparsity in input vertex feature vectors and the graph adjacency matrix and the accompanying random memory access patterns; and 3) maintain load-balanced computation in the face of uneven workloads, induced by high sparsity and power-law vertex degree distributions. This article proposes GNNIE, an accelerator designed to run a broad range of GNNs. It tackles workload imbalance by 1) splitting vertex feature operands into blocks; 2) reordering and redistributing computations; and 3) using a novel flexible MAC architecture. It adopts a graph-specific, degree-aware caching policy that is well suited to real-world graph characteristics. The policy enhances on-chip data reuse and avoids random memory access to DRAM. GNNIE achieves average speedups of 7197× over a CPU and 17.81× over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool. Compared to prior approaches, GNNIE achieves an average speedup of 5× over HyGCN (which cannot implement GATs) for GCN, GraphSAGE, and GINConv. GNNIE achieves an average speedup of 1.3× over AWB-GCN (which runs only GCNs), despite using 3.4× fewer processing units.

Original language	English (US)
Pages (from-to)	4844-4857
Number of pages	14
Journal	IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Volume	42
Issue number	12
DOIs	https://doi.org/10.1109/TCAD.2022.3232467
State	Published - Dec 1 2023

Bibliographical note

Publisher Copyright:
© 1982-2012 IEEE.

Keywords

Graph neural network (GNN)
graph-specific caching
hardware accelerator
load balancing

Access

10.1109/TCAD.2022.3232467

OpenUrl availability

Full text

Cite this

A Unified Engine for Accelerating GNN Weighting/Aggregation Operations, With Efficient Load Balancing and Graph-Specific Caching. / Mondal, Sudipta; Manasi, Susmita Dey; Kunal, Kishor et al.
In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 42, No. 12, 01.12.2023, p. 4844-4857.

Research output: Contribution to journal › Article › peer-review

@article{80542bd9edda4c898e8618e1d3f0f19f,

title = "A Unified Engine for Accelerating GNN Weighting/Aggregation Operations, With Efficient Load Balancing and Graph-Specific Caching",

abstract = "Graph neural networks (GNNs) analysis engines are vital for real-world problems that use large graph models. Challenges for a GNN hardware platform include the ability to 1) host a variety of GNNs; 2) handle high sparsity in input vertex feature vectors and the graph adjacency matrix and the accompanying random memory access patterns; and 3) maintain load-balanced computation in the face of uneven workloads, induced by high sparsity and power-law vertex degree distributions. This article proposes GNNIE, an accelerator designed to run a broad range of GNNs. It tackles workload imbalance by 1) splitting vertex feature operands into blocks; 2) reordering and redistributing computations; and 3) using a novel flexible MAC architecture. It adopts a graph-specific, degree-aware caching policy that is well suited to real-world graph characteristics. The policy enhances on-chip data reuse and avoids random memory access to DRAM. GNNIE achieves average speedups of 7197× over a CPU and 17.81× over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool. Compared to prior approaches, GNNIE achieves an average speedup of 5× over HyGCN (which cannot implement GATs) for GCN, GraphSAGE, and GINConv. GNNIE achieves an average speedup of 1.3× over AWB-GCN (which runs only GCNs), despite using 3.4× fewer processing units.",

keywords = "Graph neural network (GNN), graph-specific caching, hardware accelerator, load balancing",

author = "Sudipta Mondal and Manasi, {Susmita Dey} and Kishor Kunal and S. Ramprasath and Ziqing Zeng and Sapatnekar, {Sachin S.}",

note = "Publisher Copyright: {\textcopyright} 1982-2012 IEEE.",

year = "2023",

month = dec,

day = "1",

doi = "10.1109/TCAD.2022.3232467",

language = "English (US)",

volume = "42",

pages = "4844--4857",

journal = "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems",

issn = "0278-0070",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "12",

}

TY - JOUR

T1 - A Unified Engine for Accelerating GNN Weighting/Aggregation Operations, With Efficient Load Balancing and Graph-Specific Caching

AU - Mondal, Sudipta

AU - Manasi, Susmita Dey

AU - Kunal, Kishor

AU - Ramprasath, S.

AU - Zeng, Ziqing

AU - Sapatnekar, Sachin S.

PY - 2023/12/1

Y1 - 2023/12/1

N2 - Graph neural networks (GNNs) analysis engines are vital for real-world problems that use large graph models. Challenges for a GNN hardware platform include the ability to 1) host a variety of GNNs; 2) handle high sparsity in input vertex feature vectors and the graph adjacency matrix and the accompanying random memory access patterns; and 3) maintain load-balanced computation in the face of uneven workloads, induced by high sparsity and power-law vertex degree distributions. This article proposes GNNIE, an accelerator designed to run a broad range of GNNs. It tackles workload imbalance by 1) splitting vertex feature operands into blocks; 2) reordering and redistributing computations; and 3) using a novel flexible MAC architecture. It adopts a graph-specific, degree-aware caching policy that is well suited to real-world graph characteristics. The policy enhances on-chip data reuse and avoids random memory access to DRAM. GNNIE achieves average speedups of 7197× over a CPU and 17.81× over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool. Compared to prior approaches, GNNIE achieves an average speedup of 5× over HyGCN (which cannot implement GATs) for GCN, GraphSAGE, and GINConv. GNNIE achieves an average speedup of 1.3× over AWB-GCN (which runs only GCNs), despite using 3.4× fewer processing units.

AB - Graph neural networks (GNNs) analysis engines are vital for real-world problems that use large graph models. Challenges for a GNN hardware platform include the ability to 1) host a variety of GNNs; 2) handle high sparsity in input vertex feature vectors and the graph adjacency matrix and the accompanying random memory access patterns; and 3) maintain load-balanced computation in the face of uneven workloads, induced by high sparsity and power-law vertex degree distributions. This article proposes GNNIE, an accelerator designed to run a broad range of GNNs. It tackles workload imbalance by 1) splitting vertex feature operands into blocks; 2) reordering and redistributing computations; and 3) using a novel flexible MAC architecture. It adopts a graph-specific, degree-aware caching policy that is well suited to real-world graph characteristics. The policy enhances on-chip data reuse and avoids random memory access to DRAM. GNNIE achieves average speedups of 7197× over a CPU and 17.81× over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool. Compared to prior approaches, GNNIE achieves an average speedup of 5× over HyGCN (which cannot implement GATs) for GCN, GraphSAGE, and GINConv. GNNIE achieves an average speedup of 1.3× over AWB-GCN (which runs only GCNs), despite using 3.4× fewer processing units.

KW - Graph neural network (GNN)

KW - graph-specific caching

KW - hardware accelerator

KW - load balancing

UR - http://www.scopus.com/inward/record.url?scp=85146231073&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85146231073&partnerID=8YFLogxK

U2 - 10.1109/TCAD.2022.3232467

DO - 10.1109/TCAD.2022.3232467

M3 - Article

AN - SCOPUS:85146231073

SN - 0278-0070

VL - 42

SP - 4844

EP - 4857

JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

IS - 12

ER -

A Unified Engine for Accelerating GNN Weighting/Aggregation Operations, With Efficient Load Balancing and Graph-Specific Caching

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this