home · mobile · calendar · colloquia · 2010-2011 · 

Colloquium - Karlin

Runtime Prediction of Fused Linear Algebra in a Compiler Framework
Department of Computer Science

On modern processors, data transfer exceeds floating-point operations as the predominant cost in many linear algebra computations. For these memory-bound calculations, reducing data movement is often the only way to significantly increase their speed. One tuning technique that focuses on reducing memory accesses is loop fusion. However, determining the optimum amount of loop fusion to apply to a routine is difficult as fusion can both positively and negatively impact memory traffic.

In this talk, we present a model that predicts data movement throughout the memory hierarchy for fused linear algebra calculations. We show how to convert memory traffic predictions to runtime estimates that are used to compare loop fusion variants on serial and shared memory parallel machines. The model is integrated into a compiler where its predictions often reduce compile times by 99% or more by efficiently examining the search space of routines being considered. Additionally, the kernels produced by the compiler with the model turned on are usually the same as the optimal kernels for the target architecture found by exhaustively testing all possible loop fusion combinations.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)