Premium
Prospects and challenges of virtual machine migration in HPC
Author(s) -
Pickartz Simon,
Clauss Carsten,
Breitbart Jens,
Lankes Stefan,
Monti Antonello
Publication year - 2018
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.4412
Subject(s) - computer science , process migration , distributed computing , node (physics) , process (computing) , workload , fault tolerance , software , ranging , domain (mathematical analysis) , network topology , computer network , operating system , telecommunications , engineering , mathematical analysis , mathematics , structural engineering
Summary The continuous growth of supercomputers is accompanied by increased complexity of the intra‐node level and the interconnection topology. Consequently, the whole software stack ranging from the system software to the applications has to evolve, eg, by means of fault tolerance and support for the rising intra‐node parallelism. Migration techniques are one means to address these challenges. On the one hand, they facilitate the maintenance process by enabling the evacuation of individual nodes during runtime, ie, the implementation of fault avoidance. On the other hand, they enable dynamic load balancing for an improvement of the system's efficiency. However, these prospects come along with certain challenges. On the process level, migration mechanisms have to resolve so‐called residual dependencies to the source node, eg, the communication hardware. On the job level, migrations affect the communication topology, which should be addressed by the communication stack, ie, the optimal communication path between a pair of processes might change after a migration. In this article, we explore migration mechanisms for HPC and discuss their prospects as well as the challenges. Furthermore, we present solutions enabling their efficient usage in this domain. Finally, we evaluate our prototype co‐scheduler leveraging migration for workload optimization.