home · mobile · calendar · defenses · 1996-1997 · 

Thesis Defense - Joseph

Dynamic Markov Model Based Prefetching
Douglas Joseph
Computer Science PhD Candidate

Markov model based prefetching is a technique to dynamically construct an approximate markov model from the time series of cache miss references, with the goal of predicting future miss references. The main contribution of this thesis is that markov model based prefetching is shown to be a practical and effective technique for data and instruction prefetching on industry standard benchmarks.

This thesis examines the issues that are critical to the performance and practicality of markov prefetching. First, metrics are devised to to examine the properties of actual markov models constructed from the miss reference time series of L1 I and D cache models on the workloads. Their potential as a basis for predicting future miss references is demonstrated and compared to traditional stream oriented techniques. Estimates of the transition probabilities between markov states are derived, and upper bounds on potential prefetch accuracy and coverage is shown.

Guided by the insights provided by the metrics, practical hardware data and instruction markov prefetchers are proposed and evaluated. Three critical areas are investigated: storage requirements, memory bandwidth requirements, and prefetch lead time.

Markov based prefetching requires storage to record state and miss reference information for the dynamically evolving markov model. An important issue is the working set requirements of the model. Certain parameters in the design space of markov prefetchers have a profound effect on working set requirements. One of these is the restriction placed on the state fanout in the markov model approximation. Another is the definition of a markov state itself. For example, markov models can be constructed from the data miss reference time series or from the instruction miss reference time series. In addition, various aspects of the processor state can be incorporated into markov states (e.g. output of branch prediction or address translation units). Another issue is the information used to record the prefetch candidates associated with markov states. This can be a full cache line addresses, or in the case of prefetching from one cache to another, a cache line index. Beyond these considerations, this thesis investigates two other approaches to the working set problem. One is to split markov model storage into two separate spaces, a smaller one for high fanout states and a larger for low fanout states. Since high fanout states occur less frequently than low fanout states, better storage utilization can be achieved. We also investigate providing hardware hash tables in main memory as a backup for recently used markov model information.

Memory bandwidth is always an important consideration with prefetching. Inaccurate prefetchers can swamp the bandwidth capacity of a memory subsystem. This thesis investigates three approaches to the problem. One is a simple noise rejection scheme that waits for two correct predictions on a markov state transition before using predictions from that state for prefetching. A second approach is to dynamically monitor the accuracy of prefetches from each markov state, and disable prefetching when it passes below some threshold. A third approach, for data prefetching, is to dynamically monitor the miss rate of data referencing instructions and alter data cache allocation and/or prefetch policies when the miss rate associated with those instructions exceed some threshold. Specifically, we borrow a recently reported technique that disables cache allocation of references from "missy" load instructions, and apply the same technique to also restrict prefetches associated with those instructions to cache line sectors rather than whole cache lines.

The third major issue we address for markov prefetching in this thesis is prefetch lead time. A very accurate prefetcher may none the less be very ineffective if the prefetches are not made far enough in advance to hide the memory latency of the miss references they are predicting. It is demonstrated in this thesis that lead time for instruction markov prefetching is much lower than that for data markov prefetching on all the workloads studied. It is also shown that the prefetch lead time of prefetches from a markov state along particular classes of transitions varies in a predictable way. With this knowledge, prefetches from the classes with lower lead time can be prioritized ahead of those with higher lead time.

Finally, to validate the effectiveness of markov prefetching, cycle level simulations of prefetchers with a simplified processor model and detailed memory hierarchy models are evaluated. Results indicate that markov based prefetching can reduce memory stalls to a much greater degree than stream oriented techniques on our workloads. In addition, instruction prefetching is shown to be a greater factor in reducing stalls than data prefetching (sometimes by large degrees), especially in the commercial workloads. Simulations indicate an average 54% reduction in memory stalls across all benchmarks using markov prefetching, which was a 40% improvement over that achieved using stream oriented prefetchers alone.

Committee: Dirk Grunwald, Assistant Professor (Chair)
Robert (Bobby) Schnabel, Professor
Andrew Pleszkun, Department of Electrical and Computer Engineering
Gary Hachtel, Department of Electrical and Computer Engineering
Michael Lightner, Department of Electrical and Computer Engineering
Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:20)