Premium
Autotuning divide‐and‐conquer stencil computations
Author(s) -
Palamadai Natarajan Ekanathan,
Mehri Dehnavi Maryam,
Leiserson Charles
Publication year - 2017
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.4127
Subject(s) - stencil , computer science , benchmark (surveying) , parallel computing , divide and conquer algorithms , code (set theory) , computation , heuristic , compiler , pruning , algorithm , operating system , programming language , computational science , artificial intelligence , geodesy , set (abstract data type) , agronomy , biology , geography
Summary This paper explores autotuning strategies for serial divide‐and‐conquer stencil computations, comparing the efficacy of traditional “heuristic” autotuning with that of “pruned‐exhaustive” autotuning. We present a pruned‐exhaustive autotuner called Ztune that searches for optimal divide‐and‐conquer trees for stencil computations. Ztune uses three pruning properties—space‐time equivalence, divide subsumption, and favored dimension—that greatly reduce the size of the search domain without significantly sacrificing the quality of the autotuned code. We compared the performance of Ztune with that of a state‐of‐the‐art heuristic autotuner called OpenTuner in tuning the divide‐and‐conquer algorithm used in Pochoir stencil compiler. Over a nightly run on ten application benchmarks across two machines with different hardware configurations, the Ztuned code ran 5 % –12 % faster on average, and the OpenTuner tuned code ran from 9 % slower to 2 % faster on average, than Pochoir's default code. In the best case, the Ztuned code ran 40 % faster, and the OpenTuner tuned code ran 33 % faster than Pochoir's code. Whereas the autotuning time of Ztune for each benchmark could be measured in minutes, to achieve comparable results, the autotuning time of OpenTuner was typically measured in hours or days. Surprisingly, for some benchmarks, Ztune actually autotuned faster than the time it takes to perform the stencil computation once.