z-logo
open-access-imgOpen Access
OpenACC Cache Directive: Opportunities and Optimizations.
Author(s) -
Ahmad Lashgar,
Amirali Baniasadi
Publication year - 2016
Publication title -
2016 third workshop on accelerator programming using directives (waccpd)
Language(s) - English
DOI - 10.1109/waccpd.2016.8
OpenACC's programming model presents a simple interface to programmers, offering a trade-off between performance and development effort. OpenACC relies on compiler technologies to generate efficient code and optimize for performance. Among the difficult to implement directives, is the cache directive. The cache directive allows the programmer to utilize accelerator's hardware- or software-managed caches by passing hints to the compiler. In this paper, we investigate the implementation aspect of cache directive under NVIDIA-like GPUs and propose optimizations for the CUDA backend. We use CUDA's shared memory as the software-managed cache space. We first show that a straightforward implementation can be very inefficient, and downgrade performance. We investigate the differences between this implementation and hand-written CUDA alternatives and introduce the following optimizations to bridge the performance gap between the two: i) improving occupancy by sharing the cache among several parallel threads and ii) optimizing cache fetch and write routines via parallelization and minimizing control flow. We present compiler passes to apply these optimizations. Investigating three test cases, we show that the best cache directive implementation can perform very close to hand-written CUDA equivalent and improve performance up to 2.18X (compared to the baseline OpenACC.)

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom