home · mobile · calendar · colloquia · 2007-2008 · 

Colloquium - Scales

Replaying the Execution of Virtual Machines -- Implementation and Use Cases
VMware, Inc.

In this talk, we will describe the implementation of a complete system that can record the execution of a uniprocessor virtual machine and then replay the virtual machine execution at any time. Our implementation guarantees exact replay, is fairly low overhead, and allows a replaying VM to be inspected or to "go live" (break off from the recorded execution) at any point. We will first describe the basic implementation, and then some of the issues encountered in ensuring both correctness and good performance.

The basic record/replay capability enables a variety of potential applications, many of which we are investigating. I will describe our work in improving the debugging experience by allowing a user to debug a recorded execution. Debugging a recorded execution ensures that a non-deterministic bug can always be reproduced in the debugging session and also enables new primitives, such as reverse execution. We have developed a tool that allows ASSERTs to be enabled during the replay of an execution that were not enabled during the original execution.

Finally, the record/replay capability can also be used for providing a form of fault tolerance, in which two virtual machines (VMs) on two different physical hosts run in near lockstep. One VM (the primary) is recording its execution as it runs and the other VM (the backup) is replaying that execution. If the primary VM or host fails, the backup VM is ready to take over immediately. I'll describe our current prototype in this area, and some of the difficult implementation issues that we have had to solve.

Dr. Daniel Scales is a principal engineer at VMware, Inc., and one of the original architects of the ESX Server product. Since then, he has focused on designing and building the storage and file system functionality that makes ESX Server robust and scalable enough to be used by enterprise customers. More recently, he has worked on higher-level applications of virtual machines such as record/replay. Before working at VMware, Inc., he was a member of the research staff of DEC/Compaq Western Research Laboratory and did research on distributed systems and compilers.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)