CUDA offloading for energy‐efficient and high‐frame‐rate simulations using tablets | Zendy

MartinezNoriega Edgar Josafat | Zendy; Yazaki Syunji | Zendy; Narumi Tetsu | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

CUDA offloading for energy‐efficient and high‐frame‐rate simulations using tablets

Author(s) -

MartinezNoriega Edgar Josafat,

Yazaki Syunji,

Narumi Tetsu

Publication year - 2019

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.5488

Subject(s) - cuda , computer science , flops , general purpose computing on graphics processing units , parallel computing , frame rate , vmebus , graphics , computation , supercomputer , graphics processing unit , embedded system , operating system , software , artificial intelligence , algorithm

Summary The multiple sensors and touch capabilities of mobile devices are defining new methods of computer interaction. However, the computing power of such devices is not currently sufficient for new applications that require compute‐intensive applications. Using graphics processing units (GPUs) for general‐purpose computing with GPU programming models such as Compute Unified Device Architecture (CUDA) has been proved to accelerate simulations in supercomputers. Although, CUDA‐capable chips such as the Tegra K1 have been released on tablets can accelerate computer simulations, their absolute computing power and performance per watt are not comparable with ordinary GPUs. In this paper, we analyze a heterogeneous system composed of both of a tablet (client) and notebook with a low‐power GPU (server). Intensive computations on a tablet device are offloaded to a notebook GPU using the rCUDA middleware. Molecular dynamics (MD) simulations are performed using our test system, and the computing speed and performance per watt are reported. Implementing dynamic parallelism (DP) reduced the latency, doubling the total frames per second in some cases. Our system achieves better computational performance, and higher performance per watt than a tablet powered by a CUDA‐capable GPU. We achieved 21.7 Gflops/W by combining multiple client tablets and server, compared with 21.3 Gflops/W from the server itself.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore