Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures | Zendy

Agarwal Pratul K. | Zendy; Hampton Scott | Zendy; Poznanovic Jeffrey | Zendy; Ramanthan Arvind | Zendy; Alam Sadaf R. | Zendy; Crozier Paul S. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures

Author(s) -

Agarwal Pratul K.,

Hampton Scott,

Poznanovic Jeffrey,

Ramanthan Arvind,

Alam Sadaf R.,

Crozier Paul S.

Publication year - 2013

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.2943

Subject(s) - computer science , porting , parallel computing , multi core processor , massively parallel , cuda , workstation , graphics processing unit , computation , computational science , supercomputer , symmetric multiprocessor system , central processing unit , gpu cluster , interface (matter) , software , computer hardware , algorithm , operating system , bubble , maximum bubble pressure method

SUMMARY Performance improvements in biomolecular simulations based on molecular dynamics (MD) codes are widely desired. Unfortunately, the factors, which allowed past performance improvements, particularly the microprocessor clock frequencies, are no longer increasing. Hence, novel software and hardware solutions are being explored for accelerating performance of widely used MD codes. In this paper, we describe our efforts on porting, optimizing and tuning of Large‐scale Atomic/Molecular Massively Parallel Simulator, a popular MD framework, on heterogeneous architectures: multi‐core processors with graphical processing unit (GPU) accelerators. Our implementation is based on accelerating the most computationally expensive non‐bonded interaction terms on the GPUs and overlapping the computation on the CPU and GPUs. This functionality is built on top of message passing interface that allows multi‐level parallelism to be extracted even at the workstation level with the multi‐core CPUs and allows extension of the implementation on GPU‐enabled clusters. We hypothesize that the optimal benefit of heterogeneous architectures for applications will come by utilizing all possible resources (for example, CPU‐cores and GPU devices on GPU‐enabled clusters). Benchmarks for a range of biomolecular system sizes are provided, and an analysis is performed on four generations of NVIDIA's GPU devices. On GPU‐enabled Linux clusters, by overlapping and pipelining computation and communication, we observe up to 10‐folds application acceleration in multi‐core and multi‐GPU environments illustrating significant performance improvements. Detailed analysis of the implementation is presented that allows identification of bottlenecks in algorithm, indicating that code optimization and improvements on GPUs could allow microsecond scale simulation throughput on workstations and inexpensive GPU clusters, putting widely desired biologically relevant simulation time‐scales within reach of a large user community. In order to systematically optimize simulation throughput and to enable performance prediction, we have developed a parameterized performance model that will allow developers and users to explore the performance potential of future heterogeneous systems for biological simulations. Copyright © 2012 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research