Premium
Performance evaluation of the SX‐6 vector architecture for scientific computations
Author(s) -
Oliker Leonid,
Canning Andrew,
Carter Jonathan,
Shalf John,
Skinner David,
Ethier Stéphane,
Biswas Rupak,
Djomehri Jahed,
Van der Wijngaart Rob
Publication year - 2005
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.884
Subject(s) - computer science , suite , parallel computing , supercomputer , ibm , cache , vectorization (mathematics) , key (lock) , computer architecture , computation , operating system , algorithm , archaeology , materials science , history , nanotechnology
The growing gap between sustained and peak performance for scientific applications is a well‐known problem in high‐performance computing. The recent development of parallel vector systems offers the potential to reduce this gap for many computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX‐6 vector processor, and compares it against the cache‐based IBM Power3 and Power4 superscalar architectures, across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines many low‐level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks. Finally, we evaluate the performance of several scientific computing codes. Overall results demonstrate that the SX‐6 achieves high performance on a large fraction of our application suite and often significantly outperforms the cache‐based architectures. However, certain classes of applications are not easily amenable to vectorization and would require extensive algorithm and implementation reengineering to utilize the SX‐6 effectively. Copyright © 2005 John Wiley & Sons, Ltd.