CUDA memory optimisation strategies for motion estimation | Zendy

Sayadi Fatma Elzahra | Zendy; Chouchene Marwa | Zendy; Bahri Haithem | Zendy; Khemiri Randa | Zendy; Atri Mohamed | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

CUDA memory optimisation strategies for motion estimation

Author(s) -

Sayadi Fatma Elzahra,

Chouchene Marwa,

Bahri Haithem,

Khemiri Randa,

Atri Mohamed

Publication year - 2019

Publication title -

iet computers and digital techniques

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.219

H-Index - 46

eISSN - 1751-861X

pISSN - 1751-8601

DOI - 10.1049/iet-cdt.2017.0149

Subject(s) - cuda , computer science , parallel computing , graphics processing unit , central processing unit , general purpose computing on graphics processing units , peak signal to noise ratio , graphics , computation , architecture , motion estimation , computer engineering , computer hardware , algorithm , image (mathematics) , artificial intelligence , computer graphics (images) , art , visual arts

As video processing technologies continue to rise quicker than central processing unit (CPU) performance in complexity and image resolution, data‐parallel computing methods will be even more important. In fact, the high‐performance, data‐parallel architecture of modern graphics processing unit (GPUs) can minimise execution times by orders of magnitude or more. However, creating an optimal GPU implementation not only needs converting sequential implementation of algorithms into parallel ones but, more importantly, needs cautious balancing of the GPU resources. It requires also an understanding of the bottlenecks and defect caused by memory latency and code computing. The defiance is even greater when an implementation exceeds the GPU resources. In this study, the authors discuss the parallelisation and memory optimisation strategies of a computer vision application for motion estimation using the NVIDIA compute unified device architecture (CUDA). It addresses optimisation techniques for algorithms that surpass the GPU resources in either computation or memory resources for CUDA architecture. The proposed implementation reveals a substantial improvement in both speed up (SU) and peak signal‐to‐noise ratio (PSNR). Indeed, the implementation is up to 50 times faster than the CPU counterpart. It also provides an increase in PSNR of the coded test sequence up to 8 dB.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore