Premium
Generating optimal CUDA sparse matrix–vector product implementations for evolving GPU hardware
Author(s) -
El Zein Ahmed H.,
Rendell Alistair P.
Publication year - 2012
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.1732
Subject(s) - cuda , computer science , programmer , implementation , parallel computing , graphics , sparse matrix , process (computing) , general purpose computing on graphics processing units , computer architecture , computer graphics (images) , embedded system , programming language , physics , quantum mechanics , gaussian
SUMMARY The CUDA model for graphics processing units (GPUs) presents the programmer with a plethora of different programming options. These includes different memory types, different memory access methods and different data types. Identifying which options to use and when is a non‐trivial exercise. This paper explores the effect of these different options on the performance of a routine that evaluates sparse matrix–vector products (SpMV) across three different generations of NVIDIA GPU hardware. A process for analysing performance and selecting the subset of implementations that perform best is proposed. The potential for mapping sparse matrix attributes to optimal CUDA SpMV implementations is discussed. Copyright © 2011 John Wiley & Sons, Ltd.