Parallel Block Structured Adaptive Mesh Refinement on Graphics Processing Units | Zendy

David Beckingsale | Zendy; Wayne Gaudin | Zendy; Rich Hornung | Zendy; B Gunney | Zendy; Todd Gamblin | Zendy; J. A. Herdman | Zendy; Stephen A. Jarvis | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Parallel Block Structured Adaptive Mesh Refinement on Graphics Processing Units

Author(s) -

David Beckingsale,

Wayne Gaudin,

Rich Hornung,

B Gunney,

Todd Gamblin,

J. A. Herdman,

Stephen A. Jarvis

Publication year - 2014

Publication title -

osti oai (u.s. department of energy office of scientific and technical information)

Language(s) - English

Resource type - Reports

DOI - 10.2172/1184094

Subject(s) - computer science , cuda , adaptive mesh refinement , parallel computing , gpu cluster , block (permutation group theory) , computational science , supercomputer , overhead (engineering) , titan (rocket family) , memory hierarchy , data structure , general purpose computing on graphics processing units , graphics , computer graphics (images) , operating system , geometry , mathematics , engineering , aerospace engineering , cache

Block-structured adaptive mesh refinement is a technique that can be used when solving partial differential equations to reduce the number of zones necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a native GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an eight-node cluster, and over four thousand nodes of Oak Ridge National Laboratory’s Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87× faster than the CPU-based implementation, and has been scaled to over four thousand GPUs using a combination of MPI and CUDA.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research