home · mobile · calendar · colloquia · 2006-2007 · 

Colloquium - Maltzahn

Ceph: A Peta-Scale File System
University of California, Santa Cruz

As the size and performance requirements of storage systems has increased, file system designers have looked to new architectures to facilitate system scalability. The emerging object-based storage paradigm diverges from server-based (e.g. NFS) and SAN-based storage systems by coupling processors and memory with disk drives, allowing systems to delegate low-level allocation to object storage devices (OSDs) and decouple I/O (read/write) from metadata (file open/close) operations. Even recent object-based systems inherit a variety of decades-old architectural choices going back to early UNIX file systems, however, limiting their ability to effectively scale to hundreds of petabytes.

At the Santa Cruz Storage Systems Research Center (SSRC) we have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph maximizes the separation between data and metadata management by replacing allocation tables with a pseudo-random data distribution function (CRUSH) designed for heterogeneous and dynamic clusters of unreliable OSDs. We leverage device intelligence by distributing data replication, failure detection and recovery to semi-autonomous OSDs running a specialized local object file system. A dynamic distributed metadata cluster provides extremely efficient metadata management and seamlessly adapts to a wide range of general purpose and scientific computing file system workloads. Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata management, supporting more than 250,000 metadata operations per second.

The talk ends with a quick overview of what else is going on at the SSRC.

Dr. Carlos Maltzahn is an Assistant Research Computer Scientist at the Computer Science Department of the Jack Baskin School of Engineering and a faculty member of the Storage Systems Research Center both at the University of California at Santa Cruz. He is also Executive Director of the UCSC/Los Alamos Institute for Scalable Scientific Data Management. Dr. Maltzahn's current research interests include scalable file system data and metadata management, very long-term preservation, network intermediaries, machine learning, information retrieval, and cooperation dynamics. Dr. Maltzahn joined UC Santa Cruz in January 2005 after five years at Network Appliance. He received his PhD in Computer Science from the University of Colorado at Boulder in 1999, his MS in Computer Science in 1997, and his Univ. Diplom Informatik from the University of Passau, Germany in 1991.

Hosted by Gregory Grudic.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)