home · mobile · calendar · colloquia · 1998-1999 · 

Colloquium - Madhyastha

Automatic Input/Output Access Pattern Classification
Tara Madhyastha
Carnegie-Mellon University

Input/output (I/O) systems are a critical performance limitation for many data-intensive applications (e.g. data mining, satellite data processing). In an attempt to meet the bandwidth requirements of these applications, storage architectures often decluster file data over multiple drives or disk arrays. This added file system complexity makes I/O systems extremely sensitive to I/O access patterns; sequential file access is harder to achieve.

We examine how automatic input/output access pattern classification can determine application access patterns at execution time, guiding adaptive file system policies. We compare two novel input/output access pattern classification methods based on learning algorithms. The first approach uses a feedforward neural network previously trained on access pattern benchmarks to generate qualitative classifications. The second approach uses hidden Markov models trained on access patterns from previous executions to create a probabilistic model of input/output accesses.

One weakness of classification is that we cannot determine whether I/O requests can be reordered, an important optimization for "collective" access patterns. We demonstrate that full reordering information is not actually necessary to optimize performance for these patterns. Thus, even collective access patterns are candidates for automatic optimization within a parallel file system, where classification generates hints for prefetching.

We present results from parallel and sequential benchmarks and applications that demonstrate the viability of this approach, resulting in execution time speedups of factors of 2-4 on production scientific codes.

Hosted by Gary Nutt.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)