A thread‐block‐wise computational framework for large‐scale hierarchical continuum‐discrete modeling of granular media | Zendy

Zhao Shiwei | Zendy; Zhao Jidong | Zendy; Liang Weijian | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

A thread‐block‐wise computational framework for large‐scale hierarchical continuum‐discrete modeling of granular media

Author(s) -

Zhao Shiwei,

Zhao Jidong,

Liang Weijian

Publication year - 2020

Publication title -

international journal for numerical methods in engineering

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.421

H-Index - 168

eISSN - 1097-0207

pISSN - 0029-5981

DOI - 10.1002/nme.6549

Subject(s) - parallel computing , computer science , thread (computing) , speedup , scalability , computation , multi core processor , supercomputer , computational science , algorithm , database , operating system

This article presents a novel, scalable parallel computing framework for large‐scale and multiscale simulations of granular media. Key to the new framework is an innovative thread‐block‐wise representative volume element (RVE) parallelism, inspired by the resemblance between a typical multiscale computational hierarchy and the hierarchical thread structure of graphics processing units (GPUs). To solve a hierarchical multiscale problem, all computation in an RVE is assigned a single block of threads so that the RVE runs entirely on a GPU to avoid frequent data exchange with the host CPU. The thread blocks can meanwhile run in an asynchronization mode, which implicitly guarantees the independence of inter‐RVE computation as featured by the hierarchical multiscale structure. The parallel computing algorithms are formulated and implemented in an in‐house code, GoDEM , involving the GPU‐specific techniques such as coalesced access, shared memory utilization, and unified memory implementation. Benchmark and performance tests are conducted against an open‐source CPU‐based DEM code under three typical loading conditions. The performance of GoDEM is examined with varying thread‐block size and register pressure of the GPU, and RVE number. It reveals that increasing GPU occupancy by decreasing register pressure results in a significant degradation rather than improvement in performance. We further demonstrate that the proposed GPU parallelism framework may achieve a saturated speedup of approximately 350 compared with the single‐CPU‐core code. As a demonstration on its application for multiscale modeling of granular media, the material point method is coupled with the new framework powered DEM to simulate a typical engineering‐scale problem involving tens of millions of total particles having to be handled. It demonstrates that a speedup of approximately 91 can be achieved by using the proposed framework, compared with the performance of a similar CPU program running on a cluster node of 44 parallel threads. The study offers a viable future solution to large‐scale and multiscale modeling of granular media.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research