z-logo
Premium
A scalable Helmholtz solver in GRAPES over large‐scale multicore cluster
Author(s) -
Li Linfeng,
Xue Wei,
Ranjan Rajiv,
Jin Zhiyan
Publication year - 2013
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.2979
Subject(s) - preconditioner , solver , computer science , linear system , parallel computing , discretization , reduction (mathematics) , speedup , overhead (engineering) , mathematical optimization , scalability , incomplete lu factorization , algorithm , mathematics , computational science , iterative method , matrix decomposition , eigenvalues and eigenvectors , operating system , mathematical analysis , physics , geometry , quantum mechanics , database
SUMMARY This paper discusses performance optimization on the dynamical core of global numerical weather prediction model in Global/Regional Assimilation and Prediction System (GRAPES). GRAPES is a new generation of numerical weather prediction system developed and currently used by Chinese Meteorology Administration. The computational performance of the dynamical core in GRAPES relies on the efficient solution of three‐dimensional Helmholtz equations, which lead to large‐scale and sparse linear systems formulated by the discretization in space and time. We choose generalized conjugate residual (GCR) algorithm to solve the corresponding linear systems and further propose algorithm optimizations for large‐scale parallelism in two aspects: (i) reduction of iteration number for solution and (ii) performance enhancement of each GCR iteration. The reduction of iteration number is achieved by advanced preconditioning techniques, combining block incomplete LU factorization‐k preconditioner over 7‐diagonals of the coefficient matrix with the restricted additive Schwarz method effectively . The improvement for GCR iteration is to reduce the global communication operations by refactoring the GCR algorithm, which decreases the communication overhead over large number of cores. Performance evaluation on the Tianhe‐1A system shows that the new preconditioning techniques reduce almost one‐third iterations for solving the linear systems, the proposed methods can obtain 25% performance improvement on average compared with the original version of Helmholtz solver in GRAPES, and the speedup with our algorithms can reach 10 using 2048 cores compared with 256 cores. Copyright © 2013 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here