Accurate cross‒architecture performance modeling for sparse matrix‒vector multiplication (SpMV) on GPUs | Zendy

Guo Ping | Zendy; Wang Liqiang | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Accurate cross‒architecture performance modeling for sparse matrix‒vector multiplication (SpMV) on GPUs

Author(s) -

Guo Ping,

Wang Liqiang

Publication year - 2014

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.3217

Subject(s) - computer science , kernel (algebra) , multiplication (music) , parallel computing , architecture , cuda , double precision floating point format , performance improvement , sparse matrix , matrix (chemical analysis) , computer architecture , computational science , algorithm , mathematics , computation , chemistry , art , operations management , computational chemistry , combinatorics , chromatography , economics , visual arts , gaussian

Summary This paper presents an integrated analytical and profile‒based cross‒architecture performance modeling tool to specifically provide inter‒architecture performance prediction for Sparse Matrix‒Vector Multiplication (SpMV) on NVIDIA GPU architectures. To design and construct the tool, we investigate the inter‒architecture relative performance for multiple SpMV kernels. For a sparse matrix, based on its SpMV kernel performance measured on a reference architecture, our cross‒architecture performance modeling tool can accurately predict its SpMV kernel performance on a target architecture. The prediction results can effectively assist researchers in making choice of an appropriate architecture that best fits their needs from a wide range of available computing architectures. We evaluate our tool with 14 widely‒used sparse matrices on four GPU architectures: NVIDIA Tesla C2050, Tesla M2090, Tesla K20m, and GeForce GTX 295. In our experiments, Tesla C2050 works as the reference architecture, the other three are used as the target architectures. For Tesla M2090, the average performance differences between the predicted and measured SpMV kernel execution times for CSR, ELL, COO, and HYB SpMV kernels are 3.1 % , 5.1 % , 1.6 % , and 5.6 % , respectively. For Tesla K20m, they are 6.9 % , 5.9 % , 4.0 % , and 6.6 % on the average, respectively. For GeForce GTX 295, they are 5.9 % , 5.8 % , 3.8 % , and 5.9 % on the average, respectively. Copyright © 2014 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore