Units of Computation

Reasoning about modern communication and computing systems is difficult as, among other things, processes are distributed and may fail at any time. However, with increasing use of these systems in every day life, it is of vital importance to understand such systems and reason about them correctly, so as to be able to design them in a better way. We develop a framework that helps in understanding a fault-tolerant distributed system and so aids in designing such systems. We illustrate the uses of the developed work in application areas such as checkpointing and recovery, phase termination detection, stable property detection, implementing membership protocols, debugging, and design of programming languages. We define a unit of computation, and refer to it as a molecule. A molecule has a well defined interface with other molecules. The smallest such unit---an indivisible molecule---is termed as an atom. We show that any execution of a fault-tolerant distributed computation can be seen as an execution of molecules/atoms in a partial order, and such a view provides insights into understanding the computation, particularly for a fault-tolerant system where it is important to guarantee that a unit of computation is either completely executed or not at all and system designers need to reason about the states after execution of such units. Molecules are essentially a generalization of atomic actions.

Publications



Copyright © 1997 Shivakant Mishra