Premium
High‐performance computing for computational science
Author(s) -
GilCosta Veronica,
Senger Hermes
Publication year - 2020
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5904
Subject(s) - library science , computer science , citation , concurrency , world wide web , programming language
High-performance computing is the application of large-scale computer resources to solve computational problems that are either too large for standard computers or would take too long. Different parallel techniques, parallel and distributed programming libraries, and performance evaluation libraries are used to enhance the performance and to feature the execution of the algorithms. These techniques and tools are a valuable resource for anyone developing software codes for computational sciences. Moreover, all these techniques and tools have pushed the progress in several areas of science and engineering, which either demand large amounts of calculations or manipulate large volumes of data. This special issue focuses on efficient experimental solutions to problems on state-of-the-art computational systems consisting of large numbers of computational elements, including clusters, massively parallel supercomputers, and GPU-based systems. The objective is to open an opportunity for researchers to present and discuss new ideas and proposals for state-of-the art in HPC for computational science. The expected audience includes researchers and students in academic departments, government laboratories, and industrial organizations. This special issue presents seven papers presenting performance evaluation of large-scale applications on different parallel platforms, focusing on their scalability and performance. Six papers were carefully selected from the 2018 International Meeting on High-Performance Computing for Computational Science (VECPAR). Authors from selected papers were invited to submit extended versions based on the recommendations of the VECPAR's technical program committee. In addition, and in order to provide a broader contribution on topics related to high-performance computing for computational science, an open Call for Papers was publicly announced and distributed. In response, eight papers have been submitted. All the submissions (including the extended versions from the VECPAR) were object of a rigorous review process. As result, seven papers were accepted for publication in this special issue. Submissions co-authored by the guest editors were handled by independent editors in order to guarantee the blind peer review process. The article by Diener et al1 discussed the use of the OpenMP library for computing accelerators. OpenMP supports offloading that allows the compilation and execution of largely unmodified code on such devices. The authors also proposed a new library named Hydra that implements a concurrent execution pattern and allows scientific applications to leverage heterogeneous architectures with few modifications. Hydra helps addressing work partition and data movement. To address work partition, Hydra executes the code on the host and device and measures its execution time. The resulting performance ratio is used to calculate the work partitioning between host and devices. Data movement is handled locally. That is, data are maintained on each device as long as possible and copied only when necessary. The authors evaluated both OpenMP and Hydra with a scientific application named PlasCom2 on three heterogeneous computing platforms and give practical guidelines to improve performance. PlasCom2 is a multi-physics simulation application built up from modular, reusable components designed to be adapted to modern, heterogeneous architectures. Performance results showed that an application running on CPU+GPU with Hydra reports additional gains of up to 20%. The goal of the work presented by Wu et al2 is to create non-Hermitian matrices in parallel, to enable the comparison of solvers. More precisely, the authors presented a scalable matrix generator from given spectra (SMG2S) to benchmark the linear and eigenvalue solvers on large-scale platforms. One is the Tianhe-2 supercomputer with 16 000 nodes. Other platforms are the JURECA with 1872 compute nodes and the ROMEO system with 130 nodes. The parallel implementation of SMG2S is evaluated on homogeneous and heterogeneous cluster composed of CPUs and multi-GPUs. The authors evaluated the speedups in fairly large settings. Experimental results showed that the proposed generator has good scalability and the ability to keep the given spectra with acceptable accuracy. For large matrices, the I/O operation is a bottleneck even with high bandwidth. Garcia et al3 presented PAMPAR, a set of benchmark programs supporting pthread, OpenMP, MPI-1, and MPI-2 parallel programming interfaces. It is composed of a total of 11 implementations of different algorithms such as Dijkstra, Gram-Schmidt, discrete Fourier transform, and the Jacobi method among others. These algorithms were chosen to stress embedded and general purpose multicore processors. The authors used the PAPI tool to access hardware counters to gather data about total instructions, cache accesses, branch instructions, and floating-point operations. They also used the performance counter monitor (PCM) tool to measure energy consumption. Partial differential equations (PDEs) are widely employed in many scientific and engineering applications. Cabral et al4 studied the performance of three numerical methods for solving PDEs on a 2D domain, and evaluated their performance on two shared memory architectures, namely, the multi-core SKL and the manycore KNL. The authors studied several implementations combining OpenMP and MPI, and the best configuration depends on characteristics such as the program size, the number of synchronization points, and characteristics related to the NUMA architecture.