Premium
Preconditioned Block‐Iterative Methods on GPUs
Author(s) -
Naumov Maxim
Publication year - 2012
Publication title -
pamm
Language(s) - English
Resource type - Journals
ISSN - 1617-7061
DOI - 10.1002/pamm.201210004
Subject(s) - cuda , cholesky decomposition , biconjugate gradient stabilized method , block (permutation group theory) , parallel computing , speedup , computer science , incomplete cholesky factorization , iterative method , conjugate gradient method , sparse matrix , incomplete lu factorization , computational science , linear system , general purpose computing on graphics processing units , solver , graphics , algorithm , mathematics , matrix decomposition , computer graphics (images) , combinatorics , mathematical analysis , eigenvalues and eigenvectors , physics , programming language , gaussian , quantum mechanics
An implementation of the incomplete‐LU/Cholesky preconditioned block‐iterative methods on the Graphics Processing Units (GPUs) using the CUDA parallel programming model is presented. In particular, we focus on the tradeoffs associated with the sparse matrix‐vector multiplication with multiple vectors, sparse triangular solve with multiple right‐hand‐sides (rhs) as well as incomplete factorization with 0 fill‐in. We use these building blocks to implement the block‐CG and BiCGStab iterative methods for the symmetric positive definite (s.p.d.) and nonsymmetric linear systems, respectively. Also, in our numerical experiments we show that the implementation of the preconditioned block‐iterative methods using the CUSPARSE library on the GPU achieves an average of 3× speedup over their MKL implementation on the CPU. (© 2012 Wiley‐VCH Verlag GmbH & Co. KGaA, Weinheim)