z-logo
open-access-imgOpen Access
CUDA memory optimisation strategies for motion estimation
Author(s) -
Sayadi Fatma Elzahra,
Chouchene Marwa,
Bahri Haithem,
Khemiri Randa,
Atri Mohamed
Publication year - 2019
Publication title -
iet computers and digital techniques
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.219
H-Index - 46
eISSN - 1751-861X
pISSN - 1751-8601
DOI - 10.1049/iet-cdt.2017.0149
Subject(s) - cuda , computer science , parallel computing , graphics processing unit , central processing unit , general purpose computing on graphics processing units , peak signal to noise ratio , graphics , computation , architecture , motion estimation , computer engineering , computer hardware , algorithm , image (mathematics) , artificial intelligence , computer graphics (images) , art , visual arts
As video processing technologies continue to rise quicker than central processing unit (CPU) performance in complexity and image resolution, data‐parallel computing methods will be even more important. In fact, the high‐performance, data‐parallel architecture of modern graphics processing unit (GPUs) can minimise execution times by orders of magnitude or more. However, creating an optimal GPU implementation not only needs converting sequential implementation of algorithms into parallel ones but, more importantly, needs cautious balancing of the GPU resources. It requires also an understanding of the bottlenecks and defect caused by memory latency and code computing. The defiance is even greater when an implementation exceeds the GPU resources. In this study, the authors discuss the parallelisation and memory optimisation strategies of a computer vision application for motion estimation using the NVIDIA compute unified device architecture (CUDA). It addresses optimisation techniques for algorithms that surpass the GPU resources in either computation or memory resources for CUDA architecture. The proposed implementation reveals a substantial improvement in both speed up (SU) and peak signal‐to‐noise ratio (PSNR). Indeed, the implementation is up to 50 times faster than the CPU counterpart. It also provides an increase in PSNR of the coded test sequence up to 8 dB.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here