Premium
CUDA offloading for energy‐efficient and high‐frame‐rate simulations using tablets
Author(s) -
MartinezNoriega Edgar Josafat,
Yazaki Syunji,
Narumi Tetsu
Publication year - 2019
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5488
Subject(s) - cuda , computer science , flops , general purpose computing on graphics processing units , parallel computing , frame rate , vmebus , graphics , computation , supercomputer , graphics processing unit , embedded system , operating system , software , artificial intelligence , algorithm
Summary The multiple sensors and touch capabilities of mobile devices are defining new methods of computer interaction. However, the computing power of such devices is not currently sufficient for new applications that require compute‐intensive applications. Using graphics processing units (GPUs) for general‐purpose computing with GPU programming models such as Compute Unified Device Architecture (CUDA) has been proved to accelerate simulations in supercomputers. Although, CUDA‐capable chips such as the Tegra K1 have been released on tablets can accelerate computer simulations, their absolute computing power and performance per watt are not comparable with ordinary GPUs. In this paper, we analyze a heterogeneous system composed of both of a tablet (client) and notebook with a low‐power GPU (server). Intensive computations on a tablet device are offloaded to a notebook GPU using the rCUDA middleware. Molecular dynamics (MD) simulations are performed using our test system, and the computing speed and performance per watt are reported. Implementing dynamic parallelism (DP) reduced the latency, doubling the total frames per second in some cases. Our system achieves better computational performance, and higher performance per watt than a tablet powered by a CUDA‐capable GPU. We achieved 21.7 Gflops/W by combining multiple client tablets and server, compared with 21.3 Gflops/W from the server itself.