MPI and UPC broadcast, scatter and gather algorithms in Xeon Phi | Zendy

Mallón Damián A. | Zendy; Taboada Guillermo L. | Zendy; Koesterke Lars | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

MPI and UPC broadcast, scatter and gather algorithms in Xeon Phi

Author(s) -

Mallón Damián A.,

Taboada Guillermo L.,

Koesterke Lars

Publication year - 2015

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.3552

Subject(s) - xeon phi , computer science , supercomputer , xeon , parallel computing , infiniband , message passing interface , message passing , operating system

Summary Accelerators have revolutionised the high performance computing (HPC) community. Despite their advantages, their very specific programming models and limited communication capabilities have kept them in a supporting role of the main processors. With the introduction of Xeon Phi, this is no longer true, as it can be programmed as the main processor and has direct access to the InfiniBand network adapter. Collective operations play a key role in many HPC applications. Therefore, studying its behaviour in the context of manycore coprocessors has great importance. This work analyses the performance of different algorithms for broadcast, scatter and gather, in a large‐scale Xeon Phi supercomputer. The algorithms evaluated are those available in the reference message passing interface (MPI) implementation for Xeon Phi (Intel MPI), the default algorithm in an optimised MPI implementation (MVAPICH2‐MIC), and a new set of algorithms, developed by the authors of this work, designed with modern processors and new communication features in mind. The latter are implemented in Unified Parallel C (UPC), a partitioned global address space language, leveraging one‐sided communications, hierarchical trees and message pipelining. This study scales the experiments to 15360 cores in the Stampede supercomputer and compares the results to Xeon and hybrid Xeon + Xeon Phi experiments, with up to 19456 cores. Copyright © 2015 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research