Premium
Compiler transformation of nested loops for general purpose GPUs
Author(s) -
Tian Xiaonan,
Xu Rengan,
Yan Yonghong,
Chandrasekaran Sunita,
Eachempati Deepak,
Chapman Barbara
Publication year - 2016
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3648
Subject(s) - compiler , computer science , parallel computing , directive , suite , general purpose , scheduling (production processes) , programming paradigm , optimizing compiler , programming language , computer architecture , operations management , archaeology , economics , history
Summary Manycore accelerators have the potential to significantly improve performance of scientific applications when offloading computationally intensive program portions to accelerators. Directive‐based high‐level programming models, such as OpenACC and OpenMP, are used to create applications for accelerators through annotating regions of code meant for offloading. OpenACC is an emerging directive‐based programming model for programming accelerators that typically enable inexperienced programmers to achieve portable and productive performance within applications. In this paper, we present our research in developing challenges and solutions when creating an open‐source OpenACC compiler in an industrial framework (OpenUH as a branch of Open64). We then discuss in detail techniques we developed for loop scheduling reduction operations on general purpose GPUs. The compiler is evaluated with benchmarks from the NAS Parallel Benchmarks suite and self‐written micro‐benchmarks for reduction operations. This implementation has been designed to serve as a compiler infrastructure for researchers to explore advanced compiler techniques, extend OpenACC to other programming models, and build performance tools used in conjunction with OpenACC programs. Copyright © 2015 John Wiley & Sons, Ltd.