z-logo
Premium
A three‐stage graphics processing unit‐based finite element analyses matrix generation strategy for unstructured meshes
Author(s) -
Sanfui Subhajit,
Sharma Deepak
Publication year - 2020
Publication title -
international journal for numerical methods in engineering
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.421
H-Index - 168
eISSN - 1097-0207
pISSN - 0029-5981
DOI - 10.1002/nme.6383
Subject(s) - computer science , speedup , computational science , parallel computing , graphics , finite element method , polygon mesh , kernel (algebra) , graphics processing unit , matrix (chemical analysis) , computation , algorithm , mesh generation , cuda , theoretical computer science , computer graphics (images) , mathematics , materials science , combinatorics , composite material , physics , thermodynamics
Summary With the development of parallel computing architectures, larger and more complex finite element analyses (FEA) are being performed with higher accuracy and smaller execution times. Graphics processing units (GPUs) are one of the major contributors of this computational breakthrough. This work presents a three‐stage GPU‐based FEA matrix generation strategy with the key idea of decoupling the computation of global matrix indices and values by use of a novel data structure referred to as the neighbor matrix. The first stage computes the neighbor matrix on the GPU based on the unstructured mesh. Using this neighbor matrix, the indices and values of the global matrix are computed separately in the second and third stages. The neighbor matrix is computed for three different element types. Two versions for performing numerical integration and assembly in the same or separate kernels are implemented and simulations are run for different mesh sizes having up to three million degrees of freedom on a single GPU. Comparison with GPU‐based parallel implementation from the literature reveals speedup ranging from 4× to 6× for the proposed workload division strategy. Furthermore, the same kernel implementation is found to outperform the separate kernel implementation by 70% to 150% for different element types.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here