Premium
Accelerating unstructured finite volume computations on field‐programmable gate arrays
Author(s) -
Nagy Zoltán,
Nemes Csaba,
Hiba Antal,
Csík Árpád,
Kiss András,
Ruszinkó Miklós,
Szolgay Péter
Publication year - 2013
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3022
Subject(s) - computer science , speedup , parallel computing , computation , gate array , field programmable gate array , graphics processing unit , graph , computational science , algorithm , computer hardware , theoretical computer science
SUMMARY In the paper, an field‐programmable gate array (FPGA)‐based framework is described to efficiently accelerate unstructured finite volume computations where the same mathematical expression has to be evaluated at every point of the mesh. The irregular memory access patterns caused by the unstructured spatial discretization are eliminated by a novel mesh node reordering technique, and a special architecture is designed to fully utilize the benefits of the predictable memory access patterns. In the proposed architecture, a fixed‐size moving window of the input stream of the reordered state variables is cached into the on‐chip memory and a pipelined chain of processing elements, which gets input only from the fast on‐chip memory, is used to carry out the computations. The arithmetic unit (AU) of the processing elements is generated from the data flow graph extracted from the given mathematical expression. The data flow graph is partitioned with a novel graph partitioning algorithm to break up the AU into smaller locally controlled parts, which can be more efficiently implemented in FPGA than the globally controlled AU. The proposed architecture and algorithms are presented via a case study solving the Euler equations on an unstructured mesh. On the currently available largest FPGA, the generated architecture contains three processing elements working in a pipelined fashion to provide one order of magnitude speedup compared with a high performance microprocessor and three times speedup compared with a high performance graphics processing unit. Copyright © 2013 John Wiley & Sons, Ltd.