MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems | Zendy

Hamza Zafar | Zendy; Farrukh Aftab Khan | Zendy; Bryan Carpenter | Zendy; Aamir Shafi | Zendy; Asad Waqar Malik | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems

Author(s) -

Hamza Zafar,

Farrukh Aftab Khan,

Bryan Carpenter,

Aamir Shafi,

Asad Waqar Malik

Publication year - 2015

Publication title -

procedia computer science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.334

H-Index - 76

ISSN - 1877-0509

DOI - 10.1016/j.procs.2015.05.379

Subject(s) - computer science , yarn , big data , java , scalability , operating system , data intensive computing , supercomputer , scheduling (production processes) , software , distributed computing , programming paradigm , database , parallel computing , grid computing , programming language , operations management , materials science , geometry , mathematics , economics , composite material , grid

Many organizations—including academic, research, commercial institutions—have invested heavily in setting up High Performance Computing (HPC) facilities for running computational science applications. On the other hand, the Apache Hadoop software—after emerging in 2005— has become a popular, reliable, and scalable open-source framework for processing large-scale data (Big Data). Realizing the importance and significance of Big Data, an increasing number of organizations are investing in relatively cheaper Hadoop clusters for executing their mission critical data processing applications. An issue here is that system administrators at these sites might have to maintain two parallel facilities for running HPC and Hadoop computations. This, of course, is not ideal due to redundant maintenance work and poor economics. This paper attempts to bridge this gap by allowing HPC and Hadoop jobs to co-exist on a single hardware facility. We achieve this goal by exploiting YARN—Hadoop v2.0—that de-couples the computational and resource scheduling part of the Hadoop framework from HDFS. In this context, we have developed a YARN-based reference runtime system for the MPJ Express software that allows executing parallel MPI-like Java applications on Hadoop clusters. The main contribution of this paper is provide Big Data community access to MPI-like programming using MPJ Express. As an aside, this work allows parallel Java applications to perform computations on data stored in Hadoop Distributed File System (HDFS)

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research