
Final Report: Migration Mechanisms for Large-scale Parallel Applications
Author(s) -
Jason Nieh
Publication year - 2009
Language(s) - English
Resource type - Reports
DOI - 10.2172/966698
Subject(s) - process migration , computer science , distributed computing , virtualization , downtime , high availability , operating system , process (computing) , virtual machine , live migration , load balancing (electrical power) , locality , fault tolerance , cloud computing , linguistics , philosophy , geometry , mathematics , grid
Process migration is the ability to transfer a process from one machine to another. It is a useful facility in distributed computing environments, especially as computing devices become more pervasive and Internet access becomes more ubiquitous. The potential benefits of process migration, among others, are fault resilience by migrating processes off of faulty hosts, data access locality by migrating processes closer to the data, better system response time by migrating processes closer to users, dynamic load balancing by migrating processes to less loaded hosts, and improved service availability and administration by migrating processes before host maintenance so that applications can continue to run with minimal downtime. Although process migration provides substantial potential benefits and many approaches have been considered, achieving transparent process migration functionality has been difficult in practice. To address this problem, our work has designed, implemented, and evaluated new and powerful transparent process checkpoint-restart and migration mechanisms for desktop, server, and parallel applications that operate across heterogeneous cluster and mobile computing environments. A key aspect of this work has been to introduce lightweight operating system virtualization to provide processes with private, virtual namespaces that decouple and isolate processes from dependencies on the host operating system instance. This decoupling enables processes to be transparently checkpointed and migrated without modifying, recompiling, or relinking applications or the operating system. Building on this lightweight operating system virtualization approach, we have developed novel technologies that enable (1) coordinated, consistent checkpoint-restart and migration of multiple processes, (2) fast checkpointing of process and file system state to enable restart of multiple parallel execution environments and time travel, (3) process migration across heterogeneous software environments, (4) network checkpoint-restart and migration of distributed and parallel applications, (5) a utility computing infrastructure for mobile desktop cloud computing based on process checkpoint-restart and migration functionality, (6) a process migration security architecture for protecting applications and infrastructure from denial-of-service attacks, and (7) a checkpoint-restart mobile computing system using portable storage devices