home · mobile · calendar · colloquia · 2008-2009 · 

Colloquium - Stanzione

A Scalable Framework for Offline Parallel Debugging
Arizona State University

As supercomputers continue to grow larger, a growing community has increasingly easy access to run jobs on thousands, tens of thousands, or even hundreds of thousands of cores. However, the ability to debug these jobs at scale has not kept up with the growth in hardware. In this talk, the GDBase framework for offline debugging will be presented. GDBase solves three problems in large scale debugging: (1) it integrates with batch systems to allow debugging jobs to be run without the need to interrupt production operation; (2) it moves debugging from online to offline, to reduce the amount of system time consumed; (3) it stores results in a database to allow automated analysis of the vast quantities of debugging data that large jobs can produce. GDBase has been used to date to debug runs of more than 8,000 MPI tasks.

Dr. Daniel Stanzione, Director of the High Performance Computing Initiative (HPCI) at Arizona State University, joined the Ira A. Fulton School of Engineering in 2004. Prior to ASU, he served as an AAAS Science Policy Fellow in the Division of Graduate Education at the National Science Foundation. Stanzione began his career at Clemson University, where he earned his doctoral and master degrees in computer engineering as well as his bachelor of science in electrical engineering. He then directed the supercomputing laboratory at Clemson and also served as an assistant research professor of electrical and computer engineering. At the HPCI, Stanzione's team collaborates with UT-Austin in operating the "Ranger" supercomputer for NSF's TeraGrid, currently the 6th largest system in the world.

Dr. Stanzione's research focuses on parallel programming, scientific computing, scheduling in computational grids, alternative architectures for high end computing, reconfigurable/adaptive computing, and algorithms for high performance bioinformatics. Also an advocate of engineering education, he facilitates student research through the HPCI and teaches specialized computation engineering courses.

This talk is sponsored by the National Center for Atmospheric Research Computational & Information Systems Laboratory and will be held in the Main Seminar Room at the Mesa Lab.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)