Open Access
The design and implementation of Berkeley Lab's linuxcheckpoint/restart
Author(s) -
Jason Duell
Publication year - 2005
Language(s) - English
Resource type - Reports
DOI - 10.2172/891617
Subject(s) - operating system , computer science , linux kernel , scheduling (production processes) , interface (matter) , kernel (algebra) , component (thermodynamics) , system call , parallel computing , embedded system , engineering , operations management , physics , mathematics , bubble , combinatorics , maximum bubble pressure method , thermodynamics
This paper describes Berkeley Linux Checkpoint/Restart (BLCR), a linux kernel module that allows system-level checkpoints on a variety of Linux systems. BLCR can be used either as a stand alone system for checkpointing applications on a single machine, or as a component by a scheduling system or parallel communication library for checkpointing and restoring parallel jobs running on multiple machines. Integration with Message Passing Interface (MPI) and other parallel systems is described