12/14/2007 2:00pm-4:00pm ECCR 1B06
|
Enhanced Server Fault-Tolerance Techniques for Improved User Experience
Computer Science PhD Candidate
User applications, such as email, calendar, maps, are migrating from local
desktop machines to data centers due to the many advantages offered by such a
computing paradigm. Furthermore, this trend is creating a marked increase in
the deployment of servers at data centers. To ride the price/performance curves
for CPU, memory and other HW, inexpensive commodity machines -- although having
low availability numbers -- are the most cost effective choices for a data
center. However, increased server failures cause service outages and degrade
user experience which in turn results in lost revenue for businesses. Also,
emerging web applications put additional demands on server fault-tolerance.
For example, if a user is browsing a map service like Google, Yahoo or MSN
maps, a server failure leading to an outage of more than a few seconds is
detectable by a user and hence degrades user experience.
In this thesis, I propose three novel techniques aimed at improving server
fault-tolerance: (1) ST-TCP, which is an extension of TCP to tolerate server
failures. This is done by using an active-backup which replicates the state of
a primary and seamlessly takes over a TCP connection on primary server failure;
(2) CRAFT, where the TCP splicing mechanism is enhanced to make it both
fault-tolerant and more scalable; this then forms the basis of a scalable and
fault-tolerant web server architecture that specifically addresses server
fault-tolerance issues for highly interactive or real time applications;
and, (3) Call-preserving failover, which is an efficient and scalable
fault-tolerance mechanism for migrating IP telephony calls to an alternate
call controller.
|