Sparse LU factorization for parallel circuit simulation on GPU
Author(s) -
Ling Ren,
Xiaohong Chen,
Yu Wang,
Chenxi Zhang,
Huazhong Yang
Publication year - 2012
Publication title -
citeseer x (the pennsylvania state university)
Language(s) - English
Resource type - Conference proceedings
ISSN - 0738-100X
ISBN - 978-1-4503-1199-1
DOI - 10.1145/2228360.2228565
Subject(s) - parallel computing , computer science , speedup , scalability , bottleneck , solver , lu decomposition , multi core processor , factorization , cuda , sparse matrix , shared memory , computational science , algorithm , matrix decomposition , embedded system , operating system , gaussian , programming language , eigenvalues and eigenvectors , physics , quantum mechanics
Sparse solver has become the bottleneck of SPICE simulators. There has been few work on GPU-based sparse solver because of the high data-dependency. The strong data-dependency determines that parallel sparse LU factorization runs efficiently on shared-memory computing devices. But the number of CPU cores sharing the same memory is often limited. The state of the art Graphic Processing Units (GPU) naturally have numerous cores sharing the device memory, and provide a possible solution to the problem. In this paper, we propose a GPU-based sparse LU solver for circuit simulation. We optimize the work partitioning, the number of active thread groups, and the memory access pattern, based on GPU architecture. On matrices whose factorization involves many floating-point operations, our GPU-based sparse LU factorization achieves 7.90× speedup over 1-core CPU and 1.49× speedup over 8-core CPU. We also analyze the scalability of parallel sparse LU factorization and investigate the specifications on CPUs and GPUs that most influence the performance.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom