A three‐stage graphics processing unit‐based finite element analyses matrix generation strategy for unstructured meshes | Zendy

Sanfui Subhajit | Zendy; Sharma Deepak | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

A three‐stage graphics processing unit‐based finite element analyses matrix generation strategy for unstructured meshes

Author(s) -

Sanfui Subhajit,

Sharma Deepak

Publication year - 2020

Publication title -

international journal for numerical methods in engineering

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.421

H-Index - 168

eISSN - 1097-0207

pISSN - 0029-5981

DOI - 10.1002/nme.6383

Subject(s) - computer science , speedup , computational science , parallel computing , graphics , finite element method , polygon mesh , kernel (algebra) , graphics processing unit , matrix (chemical analysis) , computation , algorithm , mesh generation , cuda , theoretical computer science , computer graphics (images) , mathematics , materials science , combinatorics , composite material , physics , thermodynamics

Summary With the development of parallel computing architectures, larger and more complex finite element analyses (FEA) are being performed with higher accuracy and smaller execution times. Graphics processing units (GPUs) are one of the major contributors of this computational breakthrough. This work presents a three‐stage GPU‐based FEA matrix generation strategy with the key idea of decoupling the computation of global matrix indices and values by use of a novel data structure referred to as the neighbor matrix. The first stage computes the neighbor matrix on the GPU based on the unstructured mesh. Using this neighbor matrix, the indices and values of the global matrix are computed separately in the second and third stages. The neighbor matrix is computed for three different element types. Two versions for performing numerical integration and assembly in the same or separate kernels are implemented and simulations are run for different mesh sizes having up to three million degrees of freedom on a single GPU. Comparison with GPU‐based parallel implementation from the literature reveals speedup ranging from 4× to 6× for the proposed workload division strategy. Furthermore, the same kernel implementation is found to outperform the separate kernel implementation by 70% to 150% for different element types.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research