Premium
A thread‐block‐wise computational framework for large‐scale hierarchical continuum‐discrete modeling of granular media
Author(s) -
Zhao Shiwei,
Zhao Jidong,
Liang Weijian
Publication year - 2020
Publication title -
international journal for numerical methods in engineering
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.421
H-Index - 168
eISSN - 1097-0207
pISSN - 0029-5981
DOI - 10.1002/nme.6549
Subject(s) - parallel computing , computer science , thread (computing) , speedup , scalability , computation , multi core processor , supercomputer , computational science , algorithm , database , operating system
This article presents a novel, scalable parallel computing framework for large‐scale and multiscale simulations of granular media. Key to the new framework is an innovative thread‐block‐wise representative volume element (RVE) parallelism, inspired by the resemblance between a typical multiscale computational hierarchy and the hierarchical thread structure of graphics processing units (GPUs). To solve a hierarchical multiscale problem, all computation in an RVE is assigned a single block of threads so that the RVE runs entirely on a GPU to avoid frequent data exchange with the host CPU. The thread blocks can meanwhile run in an asynchronization mode, which implicitly guarantees the independence of inter‐RVE computation as featured by the hierarchical multiscale structure. The parallel computing algorithms are formulated and implemented in an in‐house code, GoDEM , involving the GPU‐specific techniques such as coalesced access, shared memory utilization, and unified memory implementation. Benchmark and performance tests are conducted against an open‐source CPU‐based DEM code under three typical loading conditions. The performance of GoDEM is examined with varying thread‐block size and register pressure of the GPU, and RVE number. It reveals that increasing GPU occupancy by decreasing register pressure results in a significant degradation rather than improvement in performance. We further demonstrate that the proposed GPU parallelism framework may achieve a saturated speedup of approximately 350 compared with the single‐CPU‐core code. As a demonstration on its application for multiscale modeling of granular media, the material point method is coupled with the new framework powered DEM to simulate a typical engineering‐scale problem involving tens of millions of total particles having to be handled. It demonstrates that a speedup of approximately 91 can be achieved by using the proposed framework, compared with the performance of a similar CPU program running on a cluster node of 44 parallel threads. The study offers a viable future solution to large‐scale and multiscale modeling of granular media.