
The design and implementation of the parallel out-of-core ScaLAPACK LU, QR and Cholesky factorization routines
Author(s) -
Eduardo F. D'Azevedo,
Jack Dongarra
Publication year - 1997
Language(s) - English
Resource type - Reports
DOI - 10.2172/296722
Subject(s) - cholesky decomposition , computer science , parallel computing , qr decomposition , factorization , incomplete cholesky factorization , incomplete lu factorization , scalability , lu decomposition , matrix decomposition , multi core processor , algorithm , operating system , eigenvalues and eigenvectors , physics , quantum mechanics
This paper describes the design and implementation of three core factorization routines--LU, QR and Cholesky--included in the out-of-core extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to fit entirely in physical memory. An image of the full matrix is maintained on disk and the factorization routines transfer sub-matrices into memory. The left-looking column-oriented variant of the factorization algorithm is implemented to reduce the disk I/O traffic. The routines are implemented using a portable I/O interface and utilize high performance ScaLAPACK factorization routines as in-core computational kernels. The authors present the details of the implementation for the out-of-core ScaLAPACK factorization routines, as well as performance and scalability results on the Intel Paragon