A Multi-level Optimization Strategy to Improve the Performance of Stencil Computation
Author(s) -
Gauthier Sornet,
Fabrice Dupros,
Sylvain Jubertie
Publication year - 2017
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2017.05.217
Subject(s) - stencil , computer science , parallel computing , compiler , vectorization (mathematics) , computation , cache , kernel (algebra) , bandwidth (computing) , memory bandwidth , set (abstract data type) , instruction level parallelism , parallelism (grammar) , algorithm , computational science , operating system , programming language , computer network , mathematics , combinatorics
Stencil computation represents an important numerical kernel in scientific computing. Leveraging multi-core or many-core parallelism to optimize such operations represents a major challenge due to both the bandwidth demand and the low arithmetic intensity. The situation is worsened by the complexity of current architectures and the potential impact of various mechanisms (cache memory, vectorization, compilation). In this paper, we describe a multi-level optimization strategy that combines manual vectorization, space tiling and stencil composition. A major effort of this study is to compare our results with the Pochoir framework. We evaluate our methodology with a set of three different compilers (Intel, Clang and GCC) on two recent generations of Intel multi-core platforms. Our results show a good match with the theoretical performance models (i.e. roofline models). We also outperform Pochoir performance by a factor of x2.5 in the best case.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom