z-logo
Premium
369 Tflop/s molecular dynamics simulations on the petaflop hybrid supercomputer ‘Roadrunner’
Author(s) -
Germann Timothy C.,
Kadau Kai,
Swaminarayan Sriram
Publication year - 2009
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.1483
Subject(s) - supercomputer , computer science , parallel computing , benchmark (surveying) , speedup , ibm , operating system , node (physics) , computational science , physics , nanotechnology , materials science , geodesy , quantum mechanics , geography
We describe the implementation of a short‐range parallel molecular dynamics (MD) code, SPaSM, on the heterogeneous general‐purpose Roadrunner supercomputer. Each Roadrunner ‘TriBlade’ compute node consists of two AMD Opteron dual‐core microprocessors and four IBM PowerXCell 8i enhanced Cell microprocessors (each consisting of one PPU and eight SPU cores), so that there are four MPI ranks per node, each with one Opteron and one Cell. We will briefly describe the Roadrunner architecture and some of the initial hybrid programming approaches that have been taken, focusing on the SPaSM application as a case study. An initial ‘evolutionary’ port, in which the existing legacy code runs with minor modifications on the Opterons and the Cells are only used to compute interatomic forces, achieves roughly a 2× speedup over the unaccelerated code. On the other hand, our ‘revolutionary’ implementation adopts a Cell‐centric view, with data structures optimized for, and living on, the Cells. The Opterons are mainly used to direct inter‐rank communication and perform I/O‐heavy periodic analysis, visualization, and checkpointing tasks. The performance measured for our initial implementation of a standard Lennard–Jones pair potential benchmark reached a peak of 369 Tflop/s double‐precision floating‐point performance on the full Roadrunner system (27.7% of peak), nearly 10× faster than the unaccelerated (Opteron‐only) version. Copyright © 2009 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here