OpenACC Cache Directive: Opportunities and Optimizations. | Zendy

Ahmad Lashgar | Zendy; Amirali Baniasadi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

OpenACC Cache Directive: Opportunities and Optimizations.

Author(s) -

Ahmad Lashgar,

Amirali Baniasadi

Publication year - 2016

Publication title -

2016 third workshop on accelerator programming using directives (waccpd)

Language(s) - English

DOI - 10.1109/waccpd.2016.8

OpenACC's programming model presents a simple interface to programmers, offering a trade-off between performance and development effort. OpenACC relies on compiler technologies to generate efficient code and optimize for performance. Among the difficult to implement directives, is the cache directive. The cache directive allows the programmer to utilize accelerator's hardware- or software-managed caches by passing hints to the compiler. In this paper, we investigate the implementation aspect of cache directive under NVIDIA-like GPUs and propose optimizations for the CUDA backend. We use CUDA's shared memory as the software-managed cache space. We first show that a straightforward implementation can be very inefficient, and downgrade performance. We investigate the differences between this implementation and hand-written CUDA alternatives and introduce the following optimizations to bridge the performance gap between the two: i) improving occupancy by sharing the cache among several parallel threads and ii) optimizing cache fetch and write routines via parallelization and minimizing control flow. We present compiler passes to apply these optimizations. Investigating three test cases, we show that the best cache directive implementation can perform very close to hand-written CUDA equivalent and improve performance up to 2.18X (compared to the baseline OpenACC.)

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research