Availability Management Service

In present systems, responsibility for reconfiguring a system after failures or removal of processors for maintenance rests mostly with the human system operators. Under stress, it is not unlikely that humans make mistakes when attempting repair actions, and this can lead to further failures and service unavailability. It is thus interesting to replace the human controlled reconfiguration service by a software implemented Availability Management service that automatically reconfigures a system in the presence of failures and restarts as soon as these events are detected. We explain the main ideas behind an Availability Management service for an asynchronous distributed system.

Publications



Copyright © 1996 Shivakant Mishra