Availability Management Service
In present systems, responsibility for reconfiguring a system
after failures or removal of processors for maintenance rests mostly with
the human system operators. Under stress, it is not unlikely that humans
make mistakes when attempting repair actions, and this can lead to further
failures and service unavailability. It is thus interesting to replace
the human controlled reconfiguration service by a software implemented
Availability Management service that automatically reconfigures a system
in the presence of failures and restarts as soon as these events are detected.
We explain the main ideas behind an Availability Management service for
an asynchronous distributed system.
Publications
Copyright © 1996 Shivakant Mishra