Units of Computation
Reasoning about modern communication and computing systems
is difficult as, among other things, processes are distributed and may
fail at any time. However, with increasing use of these systems in every
day life, it is of vital importance to understand such systems and reason
about them correctly, so as to be able to design them in a better way.
We develop a framework that helps in understanding a fault-tolerant distributed
system and so aids in designing such systems. We illustrate the uses of
the developed work in application areas such as checkpointing and recovery,
phase termination detection, stable property detection, implementing membership
protocols, debugging, and design of programming languages. We define a
unit of computation, and refer to it as a molecule. A molecule has a well
defined interface with other molecules. The smallest such unit---an indivisible
molecule---is termed as an atom. We show that any execution of a fault-tolerant
distributed computation can be seen as an execution of molecules/atoms
in a partial order, and such a view provides insights into understanding
the computation, particularly for a fault-tolerant system where it is important
to guarantee that a unit of computation is either completely executed or
not at all and system designers need to reason about the states after execution
of such units. Molecules are essentially a generalization of atomic actions.
Publications
-
M. Ahuja and S. mishra, Units of Computation in Fault-Tolerant
Distributed Systems. Journal of Parallel and Distributed Computing, Vol.
40, No. 2 (February 1997), 194--209.
-
M. Ahuja and S. mishra,
Units
of Computation in Fault-Tolerant Distributed Systems. Proceedings of
the 14th International Conference on Distributed Computing Systems (ICDCS 1994),
Poznan, Poland (June 1994).
Abstract
Copyright © 1997 Shivakant Mishra