Taming  next‐generation HPC  systems:  Run‐time  system and algorithmic advancements | Zendy

Wyrzykowski Roman | Zendy; Szymanski Boleslaw K. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Taming next‐generation HPC systems: Run‐time system and algorithmic advancements

Author(s) -

Wyrzykowski Roman,

Szymanski Boleslaw K.

Publication year - 2020

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.6153

Subject(s) - computer science , supercomputer , parallel computing

This special issue of Concurrency and Computation: Practice and Experience contains revised and extended versions of selected papers presented at the 13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019, which was held on September 8–11, 2019 in Bialystok, Poland. PPAM 2019 was organized by the Department of Computer and Information Science of the Czestochowa University of Technology together with the Bialystok University of Technology, under the patronage of the Committee of Informatics of the Polish Academy of Sciences, in technical cooperation with the IEEE Computer Society and IEEE Computational Intelligence Society. PPAM is a biennial series of international conferences dedicated to exchanging ideas between researchers involved in parallel and distributed computing, including theory and applications, as well as applied and computational mathematics. Twelve previous events have been held in different universities in Poland since 1994, when the first PPAM took place in Czestochowa. Thus, the event in Bialystok was an opportunity to celebrate the 25th anniversary of PPAM. The focus of PPAM 2019 was on models, algorithms, and software tools that facilitate efficient and convenient use of modern parallel and distributed computing systems, as well as on large-scale modern applications, including advances in machine learning and artificial intelligence. This meeting gathered more than 170 participants from 26 countries. The accepted papers were presented at the regular tracks of the PPAM 2019 conference and during the workshops. With each submission evaluated by at least three reviewers, a strict reviewing process resulted in the acceptance of 91 contributed papers for publication in the conference proceedings, while approximately 43% of the submissions were rejected. The Program Committee selected 41 papers for presentation in the regular conference track, resulting in an acceptance rate of about 46%. Based on the review results, 10 papers (11% of submissions) were selected for a special journal issue. Besides quality, another important criterion for selection was each paper’s contribution to the thematic consistency of the issue. The focus of this special issue is on algorithmic advancements in matching the software properties to parallel architecture, including GPU accelerators and clusters. These advancements are crucial for successfully parallelizing such complex applications as simulating geophysical flows, solving ordinary differential equations (ODEs), structural analysis of nuclear reactor containment buildings, solving generalized eigenvalue problems, modeling of material science phenomena, and others. A complementary topic of this issue is advances in run-time systems since increasing levels of parallelism in multiand many-core chips and the emerging heterogeneity of computational resources coupled with energy, resilience, and data movement constraints radically increase the importance of efficient run-time scheduling and execution control. After the conference, the Program Committee invited the authors of selected papers to submit revised and extended versions of their works. These new versions were reviewed independently again by at least three reviewers. Finally, nine contributions were accepted for publication. They are summarized below. Paper [1] focuses on the accurate assembly of the system matrix, which is an essential step in any code that solves partial differential equations on a mesh. This step can become costly in multigrid codes requiring cascades of matrices that depend upon each other, or dynamic adaptive mesh refinement. To reduce the time to solution, the authors propose that these constructions can be performed concurrently with the multigrid cycles. Furthermore, they desynchronize the assembly from the solution process. This non-trivial increase in the concurrency level improves the scalability. As assembly routines are notoriously memoryand bandwidth-demanding, the final algorithmic enhancement uses a hierarchical, lossy compression scheme that brings the memory footprint down aggressively even when the system matrix entries carry little information or are not yet available with high accuracy. An efficient algorithm for the parallel solution of indefinite saddle point systems with iterative solvers based on the Golub–Kahan bidiagonalization is presented in Reference [2]. Such systems arise in many application fields, for example, in structural mechanics. A scalability study of the generalized solver shows improved performance for the two-dimensional (2D) Stokes equations compared to previous works. Furthermore, the authors investigate the performance of different parallel inner solvers in the outer Golub–Kahan iteration for a three-dimensional (3D) Stokes problem. When the number of cores is increasing for a fixed problem size, the solver exhibits good speedups of up to 50% with the 1024 cores. For the tests in which the problem size grows while the workload in each core stays constant, the performance of the solver scales almost linearly with the increase in the number of cores. Paper [3] proposes a locality optimization technique for the parallel solution on GPUs of large systems of ODEs by explicit one-step methods. This technique is based on tiling across the stages of a one-step method and is enabled by a special structure of the class of ODE systems—with the limited access distance. The paper focuses on increasing the range of access distances for which the tiling technique can provide a speedup

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research