home · mobile · calendar · colloquia · 2011-2012 · 

Colloquium - Yuan

Diagnosing Production Failures with Better Logging Support
University of Illinois, Urbana-Champaign

Software systems often fail in production environment. As these failures directly affect the customers, large system vendors typically have to invest significant amounts of resources in diagnosing them. Unfortunately, diagnosing these production failures is notoriously difficult. Indeed, constrained by both privacy and expense reasons, software vendors often cannot reproduce such failures. Therefore, support engineers and developers continue to rely on the logs printed by the run time system to diagnose the production failures. However, the ad-hoc nature of today's system logs are frequently insufficient for effective failure diagnosis.

In this talk, I will describe our work on improving the software logging for better production failure diagnosis. One approach, LogEnhancer, uses a novel combination of program analysis and system techniques to collect additional information for each existing log message. Another approach, LogError, tackles the problem of "silent failures" -- failures without any log messages printed. We applied LogEnhancer and LogError to a broad range of real software systems, and found that we can significantly improve the postmortem failure diagnosis by improved software logging. The insights we learnt could also benefit programmers towards better designs of their software for better failure diagnosability.

Ding Yuan is a PhD candidate at the University of Illinois at Urbana-Champaign. He is also a visiting student at the University of California, San Diego. His research interests span the areas of systems, software engineering and programming languages, with a focus on practical approaches for failure diagnosis. He has received two ASPLOS best paper nominees, an ACM SIGSOFT Distinguished Paper award, an Outstanding Teaching Assistant award, and a Saburo Muroga Fellowship. His research on failure diagnosis has been requested for release by large vendors including Cisco, EMC, Huawei, NetApp, Qualcomm, etc.

Hosted by Li Shang.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)