TY - GEN
T1 - Truss decomposition on shared-memory parallel systems
AU - Smith, Shaden
AU - Liu, Xing
AU - Ahmed, Nesreen K.
AU - Tom, Ancy Sarah
AU - Petrini, Fabrizio
AU - Karypis, George
PY - 2017/10/30
Y1 - 2017/10/30
N2 - The scale of data used in graph analytics grows at an unprecedented rate. More than ever, domain experts require efficient and parallel algorithms for tasks in graph analytics. One such task is the truss decomposition, which is a hierarchical decomposition of the edges of a graph and is closely related to the task of triangle enumeration. As evidenced by the recent GraphChallenge, existing algorithms and implementations for truss decomposition are insufficient for the scale of modern datasets. In this work, we propose a parallel algorithm for computing the truss decomposition of massive graphs on a shared-memory system. Our algorithm breaks a computation-efficient serial algorithm into several bulk-synchronous parallel steps which do not rely on atomics or other fine-grained synchronization. We evaluate our algorithm across a variety of synthetic and real-world datasets on a 56-core Intel Xeon system. Our serial implementation achieves over 1400 × speedup over the provided GraphChallenge serial benchmark implementation and is up to 28 × faster than the state-of-the-art shared-memory parallel algorithm.
AB - The scale of data used in graph analytics grows at an unprecedented rate. More than ever, domain experts require efficient and parallel algorithms for tasks in graph analytics. One such task is the truss decomposition, which is a hierarchical decomposition of the edges of a graph and is closely related to the task of triangle enumeration. As evidenced by the recent GraphChallenge, existing algorithms and implementations for truss decomposition are insufficient for the scale of modern datasets. In this work, we propose a parallel algorithm for computing the truss decomposition of massive graphs on a shared-memory system. Our algorithm breaks a computation-efficient serial algorithm into several bulk-synchronous parallel steps which do not rely on atomics or other fine-grained synchronization. We evaluate our algorithm across a variety of synthetic and real-world datasets on a 56-core Intel Xeon system. Our serial implementation achieves over 1400 × speedup over the provided GraphChallenge serial benchmark implementation and is up to 28 × faster than the state-of-the-art shared-memory parallel algorithm.
UR - http://www.scopus.com/inward/record.url?scp=85041205217&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041205217&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2017.8091049
DO - 10.1109/HPEC.2017.8091049
M3 - Conference contribution
AN - SCOPUS:85041205217
T3 - 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
BT - 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
Y2 - 12 September 2017 through 14 September 2017
ER -