z-logo
Premium
Fast in‐place, comparison‐based sorting with CUDA: a study with bitonic sort
Author(s) -
Peters Hagen,
SchulzHildebrandt Ole,
Luttenberger Norbert
Publication year - 2011
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.1686
Subject(s) - computer science , cuda , parallel computing , sorting , sorting algorithm , sorting network , sort , architecture , general purpose computing on graphics processing units , computer architecture , graphics , algorithm , operating system , database , art , visual arts
State‐of‐the‐art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA (Compute Unified Device Architecture) increases their usability as high‐performance co‐processors for general‐purpose computing. Sorting is well investigated in Computer Science in general, but (because of this new field of application for GPUs) there is a demand for high‐performance parallel sorting algorithms that fit with the characteristics of the modern GPU‐architecture. We present a high‐performance in‐place implementation of Batcher's bitonic sorting networks for CUDA‐enabled GPUs. Therefore, we assigned compare/exchange operations to threads in a way that decreases low‐performance global‐memory access and makes efficient use of high‐performance shared memory. This greatly increases the performance of this in‐place, comparison‐based sorting algorithm. Our implementation outperforms all other algorithms in our tests when sorting 64‐bit keys. It is the fastest comparison‐based GPU sorting algorithm for 32‐bit keys, being only outperformed by (non‐comparison‐based) radix sort when sorting sequences larger than 2 23 . Copyright © 2011 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here