z-logo
Premium
Vectorized and performance‐portable quicksort
Author(s) -
Wassenberg Jan,
Blacher Mark,
Giesen Joachim,
Sanders Peter
Publication year - 2022
Publication title -
software: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.437
H-Index - 70
eISSN - 1097-024X
pISSN - 0038-0644
DOI - 10.1002/spe.3142
Subject(s) - quicksort , computer science , speedup , sorting algorithm , parallel computing , instruction set , implementation , software portability , sort , pentium , sorting , algorithm , theoretical computer science , programming language , information retrieval
Recent works showed that implementations of quicksort using vector CPU instructions can outperform the non‐vectorized algorithms in widespread use. However, these implementations are typically single‐threaded, implemented for a particular instruction set, and restricted to a small set of key types. We lift these three restrictions: our proposed vqsort algorithm integrates into the state‐of‐the‐art parallel sorteri p s 4 o $$ ip{s}^4o $$ , with a geometric mean speedup of 1.59. The same implementation works on seven instruction sets (including SVE and RISC‐V V) across four platforms. It also supports floating‐point and 16–128 bit integer keys. To the best of our knowledge, this is the fastest sort for large arrays of non‐tuple keys on CPUs, up to 20 times as fast as the sorting algorithms implemented in standard libraries. This article focuses on the practical engineering aspects enabling the speed and portability, which we have not yet seen demonstrated for a quicksort implementation. Furthermore, we introduce compact and transpose‐free sorting networks for in‐register sorting of small arrays, and a vector‐friendly pivot sampling strategy that is robust against adversarial input.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here