Premium
OpenCL performance portability for general‐purpose computation on graphics processor units: an exploration on cryptographic primitives
Author(s) -
Agosta Giovanni,
Barenghi Alessandro,
Di Federico Alessandro,
Pelosi Gerardo
Publication year - 2014
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3358
Subject(s) - computer science , software portability , compiler , computer architecture , exploit , cryptography , instruction set , domain specific language , domain (mathematical analysis) , architecture , set (abstract data type) , parallel computing , programming language , art , mathematical analysis , computer security , mathematics , visual arts
Summary The modern trend toward heterogeneous many‐core architectures has led to high architectural diversity in both high performance and high‐end embedded systems. To effectively exploit the computational resources of such a wide range of architectures, programming languages and APIs such as OpenCL have become increasingly popular. Although OpenCL provides functional code portability and the ability to fine tune the application to the target hardware, providing performance portability is still an open problem. Thus, many research works have investigated the optimization of specific combinations of application and target platform. In this paper, we aim at leveraging the experience obtained in the implementation of algorithms from the cryptography domain to provide a set of guidelines for modern many‐core heterogeneous architecture performance portability and to establish a base on which domain‐specific languages and compiler transformations could be built in the near future. We study algorithmic choices and the effect of compiler transformations on three representative applications in the chosen domain on a set of seven target platforms. To estimate how well the application fits the architecture, we define a metric of computational intensity both for the architecture and the application implementation. Besides being useful to compare either different implementation or algorithmic choices and their fitness to a specific architecture, it can also be useful to the compiler to guide the code optimization process. Copyright © 2014 John Wiley & Sons, Ltd.