8/22/1996 1:00pm-3:00pm ECOT 718
|
Dynamic Markov Model Based Prefetching
Douglas J. Joseph
Computer Science PhD Candidate
Markov model based prefetching is a technique to dynamically construct an
approximate markov model from the time series of cache miss references, with
the goal of predicting future miss references. The main contribution of this
thesis is that markov model based prefetching is shown to be a practical and
effective technique for data and instruction prefetching on industry standard
benchmarks.
This thesis examines the issues that are critical to the performance and
practicality of markov prefetching. First, metrics are devised to to examine
the properties of actual markov models constructed from the miss reference time
series of L1 I and D cache models on the workloads. Their potential as a basis
for predicting future miss references is demonstrated and compared to
traditional stream oriented techniques. Estimates of the transition
probabilities between markov states are derived, and upper bounds on
potential prefetch accuracy and coverage is shown.
Guided by the insights provided by the metrics, practical hardware data and
instruction markov prefetchers are proposed and evaluated. Three critical areas
are investigated: storage requirements, memory bandwidth requirements, and
prefetch lead time.
Markov based prefetching requires storage to record state and miss reference
information for the dynamically evolving markov model. An important issue is
the working set requirements of the model. Certain parameters in the design
space of markov prefetchers have a profound effect on working set requirements.
One of these is the restriction placed on the state fanout in the markov model
approximation. Another is the definition of a markov state itself. For example,
markov models can be constructed from the data miss reference time series or
from the instruction miss reference time series. In addition, various aspects
of the processor state can be incorporated into markov states (e.g. output of
branch prediction or address translation units). Another issue is the
information used to record the prefetch candidates associated with markov
states. This can be a full cache line addresses, or in the case of prefetching
from one cache to another, a cache line index. Beyond these considerations,
this thesis investigates two other approaches to the working set problem. One
is to split markov model storage into two separate spaces, a smaller one for
high fanout states and a larger for low fanout states. Since high fanout states
occur less frequently than low fanout states, better storage utilization can be
achieved. We also investigate providing hardware hash tables in main memory as
a backup for recently used markov model information.
Memory bandwidth is always an important consideration with prefetching.
Inaccurate prefetchers can swamp the bandwidth capacity of a memory subsystem.
This thesis investigates three approaches to the problem. One is a simple noise
rejection scheme that waits for two correct predictions on a markov state
transition before using predictions from that state for prefetching. A second
approach is to dynamically monitor the accuracy of prefetches from each markov
state, and disable prefetching when it passes below some threshold. A third
approach, for data prefetching, is to dynamically monitor the miss rate of data
referencing instructions and alter data cache allocation and/or prefetch
policies when the miss rate associated with those instructions exceed some
threshold. Specifically, we borrow a recently reported technique that disables
cache allocation of references from "missy" load instructions, and apply the
same technique to also restrict prefetches associated with those instructions
to cache line sectors rather than whole cache lines.
The third major issue we address for markov prefetching in this thesis is
prefetch lead time. A very accurate prefetcher may none the less be very
ineffective if the prefetches are not made far enough in advance to hide the
memory latency of the miss references they are predicting. It is demonstrated
in this thesis that lead time for instruction markov prefetching is much lower
than that for data markov prefetching on all the workloads studied. It is also
shown that the prefetch lead time of prefetches from a markov state along
particular classes of transitions varies in a predictable way. With this
knowledge, prefetches from the classes with lower lead time can be prioritized
ahead of those with higher lead time.
Finally, to validate the effectiveness of markov prefetching, cycle level
simulations of prefetchers with a simplified processor model and detailed
memory hierarchy models are evaluated. Results indicate that markov based
prefetching can reduce memory stalls to a much greater degree than stream
oriented techniques on our workloads. In addition, instruction prefetching
is shown to be a greater factor in reducing stalls than data prefetching
(sometimes by large degrees), especially in the commercial workloads.
Simulations indicate an average 54% reduction in memory stalls across all
benchmarks using markov prefetching, which was a 40% improvement over that
achieved using stream oriented prefetchers alone.
|