Premium
Accurate cross‒architecture performance modeling for sparse matrix‒vector multiplication (SpMV) on GPUs
Author(s) -
Guo Ping,
Wang Liqiang
Publication year - 2014
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3217
Subject(s) - computer science , kernel (algebra) , multiplication (music) , parallel computing , architecture , cuda , double precision floating point format , performance improvement , sparse matrix , matrix (chemical analysis) , computer architecture , computational science , algorithm , mathematics , computation , chemistry , art , operations management , computational chemistry , combinatorics , chromatography , economics , visual arts , gaussian
Summary This paper presents an integrated analytical and profile‒based cross‒architecture performance modeling tool to specifically provide inter‒architecture performance prediction for Sparse Matrix‒Vector Multiplication (SpMV) on NVIDIA GPU architectures. To design and construct the tool, we investigate the inter‒architecture relative performance for multiple SpMV kernels. For a sparse matrix, based on its SpMV kernel performance measured on a reference architecture, our cross‒architecture performance modeling tool can accurately predict its SpMV kernel performance on a target architecture. The prediction results can effectively assist researchers in making choice of an appropriate architecture that best fits their needs from a wide range of available computing architectures. We evaluate our tool with 14 widely‒used sparse matrices on four GPU architectures: NVIDIA Tesla C2050, Tesla M2090, Tesla K20m, and GeForce GTX 295. In our experiments, Tesla C2050 works as the reference architecture, the other three are used as the target architectures. For Tesla M2090, the average performance differences between the predicted and measured SpMV kernel execution times for CSR, ELL, COO, and HYB SpMV kernels are 3.1 % , 5.1 % , 1.6 % , and 5.6 % , respectively. For Tesla K20m, they are 6.9 % , 5.9 % , 4.0 % , and 6.6 % on the average, respectively. For GeForce GTX 295, they are 5.9 % , 5.8 % , 3.8 % , and 5.9 % on the average, respectively. Copyright © 2014 John Wiley & Sons, Ltd.