Conjugate gradients on multiple GPUs | Zendy

Georgescu Serban | Zendy; Okuda Hiroshi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Conjugate gradients on multiple GPUs

Author(s) -

Georgescu Serban,

Okuda Hiroshi

Publication year - 2010

Publication title -

international journal for numerical methods in fluids

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.938

H-Index - 112

eISSN - 1097-0363

pISSN - 0271-2091

DOI - 10.1002/fld.2462

Subject(s) - speedup , computer science , solver , conjugate gradient method , parallel computing , scalability , computation , computational science , discretization , sparse matrix , benchmark (surveying) , acceleration , algorithm , matrix multiplication , multiplication (music) , mathematics , gaussian , physics , mathematical analysis , geodesy , quantum mechanics , database , classical mechanics , combinatorics , quantum , programming language , geography

A GPU‐accelerated Conjugate Gradient solver is tested on eight different matrices with different structural and numerical characteristics. The first four matrices are obtained by discretizing the 3D Poisson's equation, which arises in many fields such as computational fluid dynamics, heat transfer and so on. Their relatively low bandwidth and low condition numbers makes them ideal targets for GPU acceleration. We chose another four matrices from the other end of the spectrum, both ill‐conditioned and with very large bandwidth. This paper concentrates on the computational aspects related to running the solver on multiple GPUs. We develop a fast distributed sparse‐matrix vector multiplication routine using optimized data formats that allows the overlapping of communication with computation and, at the same time, the sharing of some of the work with the CPU. By a thorough analysis of the time spent in communication and computation, we show that the proposed overlapped implementation outperforms the non‐overlapped one by a large margin and provides almost perfect strong scalability for large Poisson‐type matrices. We then benchmark the performance of the entire solver, using both double precision and single precision combined with iterative refinement and report up to 22× acceleration when using three GPUs as compared with one of the most powerful Intel Nehalem CPUs available today. Finally, we show that using GPUs as accelerators not only brings an order of magnitude speedup but also up to 5x increase in power efficiency and over 10x increase in cost effectiveness. Copyright © 2010 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research