Premium
Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs
Author(s) -
Carrijo Nasciutti Thiago,
Panetta Jairo,
Pais Lopes Pedro
Publication year - 2018
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.4929
Subject(s) - stencil , parallel computing , computer science , grid , computation , computational science , supercomputer , algorithm , mathematics , geometry
Summary This work compares the performance of optimizations that transform replicated global memory accesses into local memory accesses on 3D stencil computations in the NVIDIA Tesla K80 GPGPU. The optimizations reduce global memory contention caused by the set of multiprocessors. Evaluated optimizations are grid tiling, inserting spatial and temporal loops into kernels, register reuse, and some of their combinations. A standardized experiment evaluates performance variation with grid size and stencil size for each optimization. Experimental data show that codes that use these optimizations are up to 3.3 times faster than the classical stencil formulation. It also shows that the most profitable optimization varies with grid and stencil sizes.