Auto‐tuning of level 1 and level 2 BLAS for GPUs | Zendy

Sørensen Hans Henrik Brandenborg | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Auto‐tuning of level 1 and level 2 BLAS for GPUs

Author(s) -

Sørensen Hans Henrik Brandenborg

Publication year - 2012

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.2916

Subject(s) - computer science , parallel computing , linear algebra , multiplication (music) , matrix multiplication , subroutine , general purpose , cuda , computational science , computer architecture , operating system , mathematics , physics , geometry , combinatorics , quantum mechanics , quantum

SUMMARY The use of high‐performance libraries for dense linear algebra operations is of great importance in many numerical scientific applications. The most common operations form the backbone of the Basic Linear Algebra Subroutines (BLAS) library. In this paper, we consider the performance and auto‐tuning of level 1 and level 2 BLAS routines on graphical processing units. As examples, we develop single‐precision Compute Unified Device Architecture kernels for three of the most popular operations, the Euclidian norm (SNRM2), the matrix–vector multiplication (SGEMV), and the triangular solution (STRSV). The target hardware is the most recent Nvidia (Santa Clara, CA, USA) Tesla 20‐series (Fermi architecture), which is designed from the ground up for high‐performance computing. We show that it is essentially a matter of fully utilizing the fine‐grained parallelism of the many‐core graphical processing unit to achieve high performance for level 1 and level 2 BLAS operations. We show that auto‐tuning can be successfully employed to kernels for these operations so that they perform well for all input sizes. Copyright © 2012 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research