Research and implementation of a high performance parallel computing digital down converter on graphics processing unit | Zendy

Shao Guolin | Zendy; Chen Xingshu | Zendy; Yang Lu | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Research and implementation of a high performance parallel computing digital down converter on graphics processing unit

Author(s) -

Shao Guolin,

Chen Xingshu,

Yang Lu

Publication year - 2016

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.4042

Subject(s) - computer science , decimation , cuda , graphics processing unit , integrator , parallel computing , graphics , finite impulse response , kernel (algebra) , general purpose computing on graphics processing units , computer hardware , speedup , key (lock) , filter (signal processing) , computational science , algorithm , computer graphics (images) , operating system , bandwidth (computing) , telecommunications , mathematics , combinatorics , computer vision

Summary Digital down converter (DDC) is a time‐intensive and data‐intensive computing task and considered as the key technology in software defined radio. This paper proposes a high‐performance implementation of DDC on a graphics processing unit (GPU) using CUDA, which is composed of a numerically controlled oscillator stage, a cascaded integrator‐comb (CIC) decimation filter stage, and a finite impulse response (FIR) filter stage. The GPU implementation and optimizing of all the stages are studied in detail. Additionally, for handling a long‐duration signal, the signal data sequence is truncated into segments; the overlap‐save and overlap‐add mechanisms were applied in CIC stage and FIR stage, respectively. Finally, experiments were conducted to evaluate the performance of GPU‐based DDC with respect to a sequential version CPU implementation and an OpenMP implementation (16 threads). Experimental results demonstrate that the DDC achieves significant improvements on the GPU; the maximum speed ups in numerically controlled oscillator stage, CIC stage, and FIR stage can achieve more than 1242, 527, and 179 times, including data‐transfer, kernel execution, and other processing operations; the overall speed up of DDC can achieve more than 180. In the meantime, the speed ups of GPU implementation are far above the OpenMP implementation (about 2.5‐6.4 times).

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore