Scalable GPU graph traversal
Author(s) -
Duane Merrill,
Michael Garland,
Andrew Grimshaw
Publication year - 2012
Publication title -
citeseer x (the pennsylvania state university)
Language(s) - English
Resource type - Conference proceedings
ISSN - 0362-1340
DOI - 10.1145/2145816.2145832
Subject(s) - graph traversal , tree traversal , computer science , parallel computing , scalability , theoretical computer science , graph , out of core algorithm , breadth first search , distributed memory , computation , shared memory , algorithm , database
Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with non-trivial diameter. We present a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity. Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations, respectively. This level of performance is several times faster than state-of-the-art implementations both CPU and GPU platforms.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom