Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs

Da Zheng; Xiang Song; Chengru Yang; Dominique Lasalle; George Karypis

doi:10.1145/3534678.3539177

Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs

Da Zheng, Xiang Song, Chengru Yang, Dominique Lasalle, George Karypis

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

12 Scopus citations

Abstract

Graph neural networks (GNN) have shown great success in learn- ing from graph-structured data. They are widely used in various applications, such as recommendation, fraud detection, and search. In these domains, the graphs are typically large and heterogeneous, containing many millions or billions of vertices and edges of different types. To tackle this challenge, we develop DistDGLv2, a system that extends DistDGL for training GNNs on massive heterogeneous graphs in a mini-batch fashion, using distributed hybrid CPU/GPU training. DistDGLv2 places graph data in distributed CPU memory and performs mini-batch computation in GPUs. For ease of use, DistDGLv2 adopts API compatible with Deep Graph Library (DGL)'s mini-batch training and heterogeneous graph API, which enables distributed training with almost no code modification. To ensure model accuracy, DistDGLv2 follows a synchronous training approach and allows ego-networks forming mini-batches to include non-local vertices. To ensure data locality and load balancing, DistDGLv2 partitions heterogeneous graphs by using a multi-level partitioning algorithm with min-edge cut and multiple balancing constraints. DistDGLv2 deploys an asynchronous mini- batch generation pipeline that makes computation and data access asynchronous to fully utilize all hardware (CPU, GPU, network, PCIe). We demonstrate DistDGLv2 on various GNN workloads. Our results show that DistDGLv2 achieves 2 - 3x speedup over DistDGL and 18× speedup over Euler. It takes only 5 - 10 seconds to complete an epoch on graphs with hundreds of millions of vertices on a cluster with 64 GPUs.

Original language	English (US)
Title of host publication	KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Publisher	Association for Computing Machinery
Pages	4582-4591
Number of pages	10
ISBN (Electronic)	9781450393850
DOIs	https://doi.org/10.1145/3534678.3539177
State	Published - Aug 14 2022
Externally published	Yes
Event	28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022 - Washington, United States Duration: Aug 14 2022 → Aug 18 2022

Publication series

Name	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference	28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022
Country/Territory	United States
City	Washington
Period	8/14/22 → 8/18/22

Bibliographical note

Publisher Copyright:
© 2022 ACM.

Keywords

distributed training
graph neural networks

Access

10.1145/3534678.3539177

OpenUrl availability

Full text

Cite this

Zheng, D., Song, X., Yang, C., Lasalle, D., & Karypis, G. (2022). Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs. In KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 4582-4591). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). Association for Computing Machinery. https://doi.org/10.1145/3534678.3539177

Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs. / Zheng, Da; Song, Xiang; Yang, Chengru et al.
KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2022. p. 4582-4591 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Zheng, D, Song, X, Yang, C, Lasalle, D & Karypis, G 2022, Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs. in KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, pp. 4582-4591, 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022, Washington, United States, 8/14/22. https://doi.org/10.1145/3534678.3539177

Zheng D, Song X, Yang C, Lasalle D, Karypis G. Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs. In KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2022. p. 4582-4591. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). doi: 10.1145/3534678.3539177

Zheng, Da ; Song, Xiang ; Yang, Chengru et al. / Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs. KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2022. pp. 4582-4591 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

@inproceedings{17c843fd3533485cbba0f7eeae111df6,

title = "Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs",

abstract = "Graph neural networks (GNN) have shown great success in learn- ing from graph-structured data. They are widely used in various applications, such as recommendation, fraud detection, and search. In these domains, the graphs are typically large and heterogeneous, containing many millions or billions of vertices and edges of different types. To tackle this challenge, we develop DistDGLv2, a system that extends DistDGL for training GNNs on massive heterogeneous graphs in a mini-batch fashion, using distributed hybrid CPU/GPU training. DistDGLv2 places graph data in distributed CPU memory and performs mini-batch computation in GPUs. For ease of use, DistDGLv2 adopts API compatible with Deep Graph Library (DGL)'s mini-batch training and heterogeneous graph API, which enables distributed training with almost no code modification. To ensure model accuracy, DistDGLv2 follows a synchronous training approach and allows ego-networks forming mini-batches to include non-local vertices. To ensure data locality and load balancing, DistDGLv2 partitions heterogeneous graphs by using a multi-level partitioning algorithm with min-edge cut and multiple balancing constraints. DistDGLv2 deploys an asynchronous mini- batch generation pipeline that makes computation and data access asynchronous to fully utilize all hardware (CPU, GPU, network, PCIe). We demonstrate DistDGLv2 on various GNN workloads. Our results show that DistDGLv2 achieves 2 - 3x speedup over DistDGL and 18× speedup over Euler. It takes only 5 - 10 seconds to complete an epoch on graphs with hundreds of millions of vertices on a cluster with 64 GPUs.",

keywords = "distributed training, graph neural networks",

author = "Da Zheng and Xiang Song and Chengru Yang and Dominique Lasalle and George Karypis",

note = "Publisher Copyright: {\textcopyright} 2022 ACM.; 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022 ; Conference date: 14-08-2022 Through 18-08-2022",

year = "2022",

month = aug,

day = "14",

doi = "10.1145/3534678.3539177",

language = "English (US)",

series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

publisher = "Association for Computing Machinery",

pages = "4582--4591",

booktitle = "KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs

AU - Zheng, Da

AU - Song, Xiang

AU - Yang, Chengru

AU - Lasalle, Dominique

AU - Karypis, George

PY - 2022/8/14

Y1 - 2022/8/14

N2 - Graph neural networks (GNN) have shown great success in learn- ing from graph-structured data. They are widely used in various applications, such as recommendation, fraud detection, and search. In these domains, the graphs are typically large and heterogeneous, containing many millions or billions of vertices and edges of different types. To tackle this challenge, we develop DistDGLv2, a system that extends DistDGL for training GNNs on massive heterogeneous graphs in a mini-batch fashion, using distributed hybrid CPU/GPU training. DistDGLv2 places graph data in distributed CPU memory and performs mini-batch computation in GPUs. For ease of use, DistDGLv2 adopts API compatible with Deep Graph Library (DGL)'s mini-batch training and heterogeneous graph API, which enables distributed training with almost no code modification. To ensure model accuracy, DistDGLv2 follows a synchronous training approach and allows ego-networks forming mini-batches to include non-local vertices. To ensure data locality and load balancing, DistDGLv2 partitions heterogeneous graphs by using a multi-level partitioning algorithm with min-edge cut and multiple balancing constraints. DistDGLv2 deploys an asynchronous mini- batch generation pipeline that makes computation and data access asynchronous to fully utilize all hardware (CPU, GPU, network, PCIe). We demonstrate DistDGLv2 on various GNN workloads. Our results show that DistDGLv2 achieves 2 - 3x speedup over DistDGL and 18× speedup over Euler. It takes only 5 - 10 seconds to complete an epoch on graphs with hundreds of millions of vertices on a cluster with 64 GPUs.

AB - Graph neural networks (GNN) have shown great success in learn- ing from graph-structured data. They are widely used in various applications, such as recommendation, fraud detection, and search. In these domains, the graphs are typically large and heterogeneous, containing many millions or billions of vertices and edges of different types. To tackle this challenge, we develop DistDGLv2, a system that extends DistDGL for training GNNs on massive heterogeneous graphs in a mini-batch fashion, using distributed hybrid CPU/GPU training. DistDGLv2 places graph data in distributed CPU memory and performs mini-batch computation in GPUs. For ease of use, DistDGLv2 adopts API compatible with Deep Graph Library (DGL)'s mini-batch training and heterogeneous graph API, which enables distributed training with almost no code modification. To ensure model accuracy, DistDGLv2 follows a synchronous training approach and allows ego-networks forming mini-batches to include non-local vertices. To ensure data locality and load balancing, DistDGLv2 partitions heterogeneous graphs by using a multi-level partitioning algorithm with min-edge cut and multiple balancing constraints. DistDGLv2 deploys an asynchronous mini- batch generation pipeline that makes computation and data access asynchronous to fully utilize all hardware (CPU, GPU, network, PCIe). We demonstrate DistDGLv2 on various GNN workloads. Our results show that DistDGLv2 achieves 2 - 3x speedup over DistDGL and 18× speedup over Euler. It takes only 5 - 10 seconds to complete an epoch on graphs with hundreds of millions of vertices on a cluster with 64 GPUs.

KW - distributed training

KW - graph neural networks

UR - http://www.scopus.com/inward/record.url?scp=85137140385&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85137140385&partnerID=8YFLogxK

U2 - 10.1145/3534678.3539177

DO - 10.1145/3534678.3539177

M3 - Conference contribution

AN - SCOPUS:85137140385

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 4582

EP - 4591

BT - KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

T2 - 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022

Y2 - 14 August 2022 through 18 August 2022

ER -

Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this