z-logo
open-access-imgOpen Access
Development of a High-Performance Eigensolver on a Peta-Scale Next-Generation Supercomputer System
Author(s) -
Toshiyuki Imamura,
Susumu Yamada,
Masahiko Machida
Publication year - 2011
Publication title -
progress in nuclear science and technology
Language(s) - English
Resource type - Journals
ISSN - 2185-4823
DOI - 10.15669/pnst.2.643
Subject(s) - supercomputer , parallel computing , computer science , scale (ratio) , computer architecture , computational science , physics , quantum mechanics
For current supercomputer systems, multicore and multisocket processors are required in order to build a system, and choice of interconnection is essential. In addition, for effective development of new code, high-performance, scalable, and reliable numerical software is key. ScaLAPACK and PETSc are software developed for distributed memory parallel computer systems. Real computation requires software that is highly tuned for implementation on new architectures, such as many-core processors. In the present study, we introduce a high-performance, highly scalable eigenvalue solver with the goal of realizing the K-computer system, which is a next-generation supercomputer system. We have developed two versions of this eigenvalue solver, namely, the standard version (eigen_s) and an enhanced-performance version (eigen_sx), both of which were developed on the T2K cluster system housed at the University of Tokyo. Eigen_s uses conventional algorithms, such as Householder tridiagonalization, the divide and conquer (DC) algorithm, and the Householder backtransformation. These algorithms are carefully implemented using a blocking technique and flexible two-dimensional data-distribution in order to reduce the overhead of memory traffic and data transfer, respectively. Eigen_s performs excellently on the T2K system with 4,096 cores (theoretical peak: 37.6 TFLOPS) and exhibits fine performance (3.0 TFLOPS) with a 200,000-dimensional matrix. The enhanced version, eigen_sx, uses more advanced algorithms, such as the narrow-band reduction algorithm, DC for band matrices, and the block Householder back-transformation with WY- representation. Even though this version is still in the test stage, eigen_sxhas realized 4.7 TFLOPS with a 200,000-dimensional matrix.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom