Supernodal sparse Cholesky factorization on graphics processing units | Zendy

Zou Dan | Zendy; Dou Yong | Zendy; Guo Song | Zendy; Li Rongchun | Zendy; Deng Lin | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Supernodal sparse Cholesky factorization on graphics processing units

Author(s) -

Zou Dan,

Dou Yong,

Guo Song,

Li Rongchun,

Deng Lin

Publication year - 2014

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.3158

Subject(s) - cholesky decomposition , computer science , parallel computing , minimum degree algorithm , incomplete cholesky factorization , cuda , sparse matrix , graphics processing unit , solver , general purpose computing on graphics processing units , multi core processor , factorization , computational science , graphics , algorithm , eigenvalues and eigenvectors , physics , computer graphics (images) , quantum mechanics , programming language , gaussian

SUMMARY Sparse Cholesky factorization is the most computationally intensive component in solving large sparse linear systems and is the core algorithm of numerous scientific computing applications. A large number of sparse Cholesky factorization algorithms have previously emerged, exploiting architectural features for various computing platforms. The recent use of graphics processing units (GPUs) to accelerate structured parallel applications shows the potential to achieve significant acceleration relative to desktop performance. However, sparse Cholesky factorization has not been explored sufficiently because of the complexity involved in its efficient implementation and the concerns of low GPU utilization. In this paper, we present a new approach for sparse Cholesky factorization on GPUs. We present the organization of the sparse matrix supernode data structure for GPU and propose a queue‐based approach for the generation and scheduling of GPU tasks with dense linear algebraic operations. We also design a subtree‐based parallel method for multi‐GPU system. These approaches increase GPU utilization, thus resulting in substantial computational time reduction. Comparisons are made with the existing parallel solvers by using problems arising from practical applications. The experiment results show that the proposed approaches can substantially improve sparse Cholesky factorization performance on GPUs. Relative to a highly optimized parallel algorithm on a 12‐core node, we were able to obtain speedups in the range 1.59× to 2.31× by using one GPU and 1.80× to 3.21× by using two GPUs. Relative to a state‐of‐the‐art solver based on supernodal method for CPU‐GPU heterogeneous platform, we were able to obtain speedups in the range 1.52× to 2.30× by using one GPU and 2.15× to 2.76× by using two GPUs. Concurrency and Computation: Practice and Experience, 2013. Copyright © 2013 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research