Parallel LU Factorization on GPU Cluster
Author(s) -
E. D’Azevedo,
Judith Hill
Publication year - 2012
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2012.04.008
Subject(s) - computer science , parallel computing , gpu cluster , factorization , cluster (spacecraft) , computational science , cuda , computer graphics (images) , algorithm , operating system
This paper describes our progressindeveloping softwarefor performing parallelLUfactorizationofalarge dense matrix on a GPU cluster. Three approaches, with increasing software complexity, are considered: (i) a naive “thunking” approach that links the existing parallel ScaLAPACK software library with cuBLAS through a software emulation layer; (ii) a more intrusive magmaBLAS implementation integrated into the LU solver in the High-Performance Linpack software; and (iii) a left-looking out-of-core algorithm for solving problems that are larger than the available memory on GPUdevices. Comparisonof the performancegainsversus the current ScaLAPACK PZGETRF are provided
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom