Reducing the amount of out‐of‐core data access for GPU‐accelerated randomized SVD | Zendy

Lu Yuechao | Zendy; Yamazaki Ichitaro | Zendy; Ino Fumihiko | Zendy; Matsushita Yasuyuki | Zendy; Tomov Stanimire | Zendy; Dongarra Jack | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Reducing the amount of out‐of‐core data access for GPU‐accelerated randomized SVD

Author(s) -

Lu Yuechao,

Yamazaki Ichitaro,

Ino Fumihiko,

Matsushita Yasuyuki,

Tomov Stanimire,

Dongarra Jack

Publication year - 2020

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.5754

Subject(s) - computer science , parallel computing , singular value decomposition , partition (number theory) , multi core processor , out of core algorithm , matrix multiplication , graphics , data access , graphics processing unit , single core , algorithm , computational science , mathematics , computer graphics (images) , physics , combinatorics , quantum mechanics , quantum , programming language

Summary We propose two acceleration methods, namely, Fused and Gram, for reducing out‐of‐core data access when performing randomized singular value decomposition (RSVD) on graphics processing units (GPUs). Out‐of‐core data here are data that are too large to fit into the GPU memory at once. Both methods accelerate GPU‐enabled RSVD using the following three schemes: (1) a highly tuned general matrix‐matrix multiplication (GEMM) scheme for processing out‐of‐core data on GPUs; (2) a data‐access reduction scheme based on one‐dimensional data partition; and (3) a first‐in, first‐out scheme that reduces CPU‐GPU data transfer using the reverse iteration. The Fused method further reduces the amount of out‐of‐core data access by merging two GEMM operations into a single operation. By contrast, the Gram method reduces both in‐core and out‐of‐core data access by explicitly forming the Gram matrix. According to our experimental results, the Fused and Gram methods improved the RSVD performance up to 1.7× and 5.2×, respectively, compared with a straightforward method that deploys schemes (1) and (2) on the GPU. In addition, we present a case study of deploying the Gram method for accelerating robust principal component analysis, a convex optimization problem in machine learning.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research