home · mobile · calendar · colloquia · 1999-2000 · 

Colloquium - Speight

Efficient Parallel Computing on Clusters of Multiprocessors
Rice University

The face of parallel computing has changed significantly over the last decade. Thanks to the availability of low cost, high-performance workstations and user-level networks, clusters of small-scale symmetric multiprocessors have emerged as a viable alternative to previous, expensive monolithic systems. These clusters provide an excellent price/performance operating point for a variety of small to medium scale parallel programming needs, but providing a cohesive programming environment in such a context remains a challenging task.

This talk will describe the Brazos Parallel Programming Environment. Brazos has been under development for five years, and addresses many of the problems associated with utilizing clusters as a single parallel computer. In particular, work to date allows Brazos to provide both a shared memory programming interface through the use of ANL macros, and a message-passing interface through an implementation of the MPI library. Furthermore, Brazos programmers have the option of using both programming styles in the same application. Brazos achieves superior performance on shared memory applications through the selective use of multicast communication, adaptive runtime performance tuning, and a software adaptation of scope consistency. Brazos has also been adapted to use the Virtual Interface (VI) Architecture, a proposed industry-standard low-latency, user-level network adopted by Microsoft, Intel, and Compaq. The performance gained by tailoring Brazos to make use of the specific features of the VI Architecture will be discussed.

Current work on Brazos seeks to address two problems associated with using clusters to solve large-scale parallel problems: cluster reliability and cluster resource sharing. Brazos utilizes user-level thread migration combined with a checkpoint/recovery mechanism to provide a reliable system that tolerates single or multiple node failure without necessitating a restart of the parallel applications currently running. Additionally, Brazos provides a multiprogrammed parallel environment, allowing the runtime system to take advantage of processor and network under-utilization without the expense of executing multiple instances of the runtime support system on each node. The mechanisms and performance of this ongoing work will be presented.

Refreshments will be served in ECOT 831 immediately following the talk.
Hosted by Gary Nutt.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)