Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi‐core architectures | Zendy

Oryspayev Dossay | Zendy; Aktulga Hasan Metin | Zendy; Sosonkina Masha | Zendy; Maris Pieter | Zendy; Vary James P. | Zendy

Premium

Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi‐core architectures

Author(s) -

Oryspayev Dossay,

Aktulga Hasan Metin,

Sosonkina Masha,

Maris Pieter,

Vary James P.

Publication year - 2015

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.3499

Subject(s) - computer science , parallel computing , multi core processor , scalability , sparse matrix , matrix multiplication , kernel (algebra) , algorithm , topology (electrical circuits) , mathematics , gaussian , physics , quantum mechanics , database , combinatorics , quantum

Summary Sparse matrix vector multiply (SpMVM) is an important kernel that frequently arises in high performance computing applications. Due to its low arithmetic intensity, several approaches have been proposed in literature to improve its scalability and efficiency in large scale computations. In this paper, our target systems are high end multi‐core architectures and we use messaging passing interface + open multiprocessing hybrid programming model for parallelism. We analyze the performance of recently proposed implementation of the distributed symmetric SpMVM, originally developed for large sparse symmetric matrices arising in ab initio nuclear structure calculations. We study important features of this implementation and compare with previously reported implementations that do not exploit underlying symmetry. Our SpMVM implementations leverage the hybrid paradigm to efficiently overlap expensive communications with computations. Our main comparison criterion is the ‘CPU core hours’ metric, which is the main measure of resource usage on supercomputers. We analyze the effects of topology‐aware mapping heuristic using simplified network load model. We have tested the different SpMVM implementations on two large clusters with 3D Torus and Dragonfly topology. Our results show that the distributed SpMVM implementation that exploits matrix symmetry and hides communication yields the best value for the ‘CPU core hours’ metric and significantly reduces data movement overheads. Copyright © 2015 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research