C2CU: a CUDA C program generator for bulk execution of a sequential algorithm | Zendy

Takafuji Daisuke | Zendy; Nakano Koji | Zendy; Ito Yasuaki | Zendy; Bordim Jacir | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

C2CU: a CUDA C program generator for bulk execution of a sequential algorithm

Author(s) -

Takafuji Daisuke,

Nakano Koji,

Ito Yasuaki,

Bordim Jacir

Publication year - 2016

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.4022

Subject(s) - computer science , cuda , parallel computing , sorting algorithm , sorting , algorithm , matrix multiplication , generator (circuit theory) , computation , execution time , power (physics) , physics , quantum mechanics , quantum

Summary Several important tasks, including matrix computation, signal processing, sorting, dynamic programming, encryption, and decryption, can be performed by oblivious sequential algorithms. A sequential algorithm is oblivious if an address accessed at each time does not depend on the input data. A bulk execution of a sequential algorithm is to execute it for many independent inputs in turn or in parallel. A number of works have been devoted to design and implement parallel algorithms for a single input. However, none of these works evaluated the bulk execution performance of these algorithms. The first contribution of this paper is to present a time‐optimal implementation for bulk execution of an oblivious sequential algorithm. Our second contribution is to develop a tool, named C2CU, which automatically generates a CUDA C program for a bulk execution of an oblivious sequential algorithm. The C2CU has been used to generate CUDA C programs for the bulk execution of the bitonic sorting, Floyd‐Warshall, and Montgomery modulo multiplication algorithms. Compared to a sequential implementation on a single CPU, the generated CUDA C programs for the above algorithms run, respectively, 199, 54, and 78 times faster.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research