Accelerating unstructured finite volume computations on field‐programmable gate arrays | Zendy

Nagy Zoltán | Zendy; Nemes Csaba | Zendy; Hiba Antal | Zendy; Csík Árpád | Zendy; Kiss András | Zendy; Ruszinkó Miklós | Zendy; Szolgay Péter | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Accelerating unstructured finite volume computations on field‐programmable gate arrays

Author(s) -

Nagy Zoltán,

Nemes Csaba,

Hiba Antal,

Csík Árpád,

Kiss András,

Ruszinkó Miklós,

Szolgay Péter

Publication year - 2013

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.3022

Subject(s) - computer science , speedup , parallel computing , computation , gate array , field programmable gate array , graphics processing unit , graph , computational science , algorithm , computer hardware , theoretical computer science

SUMMARY In the paper, an field‐programmable gate array (FPGA)‐based framework is described to efficiently accelerate unstructured finite volume computations where the same mathematical expression has to be evaluated at every point of the mesh. The irregular memory access patterns caused by the unstructured spatial discretization are eliminated by a novel mesh node reordering technique, and a special architecture is designed to fully utilize the benefits of the predictable memory access patterns. In the proposed architecture, a fixed‐size moving window of the input stream of the reordered state variables is cached into the on‐chip memory and a pipelined chain of processing elements, which gets input only from the fast on‐chip memory, is used to carry out the computations. The arithmetic unit (AU) of the processing elements is generated from the data flow graph extracted from the given mathematical expression. The data flow graph is partitioned with a novel graph partitioning algorithm to break up the AU into smaller locally controlled parts, which can be more efficiently implemented in FPGA than the globally controlled AU. The proposed architecture and algorithms are presented via a case study solving the Euler equations on an unstructured mesh. On the currently available largest FPGA, the generated architecture contains three processing elements working in a pipelined fashion to provide one order of magnitude speedup compared with a high performance microprocessor and three times speedup compared with a high performance graphics processing unit. Copyright © 2013 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research