A Case for Work-stealing on FPGAs with OpenCL Atomics
Author(s) -
Nadesh Ramanathan,
John Wickerson,
Felix Winterstein,
George A. Constantinides
Publication year - 2016
Publication title -
spiral (imperial college london)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/2847263.2847343
Subject(s) - computer science , speedup , parallel computing , field programmable gate array , load balancing (electrical power) , block (permutation group theory) , lock (firearm) , operating system , embedded system , work (physics) , mechanical engineering , geometry , mathematics , engineering , grid
We provide a case study of work-stealing, a popular method for run-time load balancing, on FPGAs. Following the Cederman-Tsigas implementation for GPUs, we synchronize work-items not with locks, mutexes or critical sections, but instead with the atomic operations provided by Altera's OpenCL SDK. We evaluate work-stealing for FPGAs by synthesizing a K-means clustering algorithm on an Altera P385 D5 board, both with work-stealing and with a statically-partitioned load. When block RAM utilization is maximised in both cases, we find that work-stealing leads to a 1.5x speedup. This demonstrates that the ability to do load balancing at run-time can outweigh the drawback of using `expensive' atomics on FPGAs. We hope that our case study will stimulate further research into the high-level synthesis of fine-grained, lock-free, concurrent programs.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom