home · mobile · calendar · colloquia · 2000-2001 · 

Colloquium - Roth

Pre-Execution: Staying on the Performance Curve
University of Wisconsin - Madison

Emerging applications continue to demand geometric increases in performance. Constraints like power, manufacturing cost, and the limits of physics itself mean that this performance will have to be obtained primarily by efficient extraction and exploitation of parallelism, rather than raw frequency or brute force solutions that provide only diminishing returns. Pre-execution is a technique that aggressively but efficiently finds parallelism in sequential codes -- codes that cannot be explicitly parallelized.

Pre-execution directly attacks the performance problems of sequential codes -- loads and branches that are not handled by conventional caches and branch predictors. These "problem instructions" make up only 3% of the dynamic instruction stream, but are responsible for up to 60% of total execution time. Pre-execution identifies static branches and loads that cause the majority of problems, isolates their computations (backward-slices, dataflow-graphs), and "pre-executes" copies of these computations on separate threads. This decoupling effectively moves stalls induced by problem instructions from the main thread, whose performance is externally visible, to other threads, whose performance is not. The main thread maintains instruction throughput while auxiliary threads run ahead and "consume" the latencies of future problem instructions. When the main thread catches up to a pre-executed problem instruction, it sees it as already complete and easily avoids the stall previously associated with it. Pre-execution reduces sequential execution times by 20% and can be implemented with minimal additions to existing hardware.

Amir Roth is a PhD candidate at the University of Wisconsin - Madison. His primary research interests are in the area of computer architecture -- in particular, the design of future high-performance microprocessors. He is also interested in emerging applications and their performance needs, compiler technology, and opportunities for software/hardware cooperation. Amir received his BS degree from Yale University in 1994. From 1994 to 1995, he worked as a software developer for Microsoft Corporation. He received his MS from the University of Wisconsin - Madison in 1997 and since then has been working on his thesis research with Prof. Guri Sohi. His thesis research explores pre-execution -- a new model for exposing and extracting fine-grained parallelism in sequential code. Amir has also published papers on hardware mechanisms for reusing computations, hardware, software, and cooperative techniques for tolerating the serial memory latencies associated with the use of pointer-based data structures, and expanded instruction set interfaces that allow the processor to exploit additional information that is available to the compiler.

Hosted by John Bennett.
Refreshments will be served immediately following the talk in ECOT 831.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)