skip to main content
Department of Computer Science University of Colorado Boulder
cu: home | engineering | mycuinfo | about | cu a-z | search cu | contact cu cs: about | calendar | directory | catalog | schedules | mobile | contact cs
home · events · colloquia · 2006-2007 · 

Colloquium - Maltzahn

ECCR 265

Ceph: A Peta-Scale File System
University of California, Santa Cruz

As the size and performance requirements of storage systems has increased, file system designers have looked to new architectures to facilitate system scalability. The emerging object-based storage paradigm diverges from server-based (e.g. NFS) and SAN-based storage systems by coupling processors and memory with disk drives, allowing systems to delegate low-level allocation to object storage devices (OSDs) and decouple I/O (read/write) from metadata (file open/close) operations. Even recent object-based systems inherit a variety of decades-old architectural choices going back to early UNIX file systems, however, limiting their ability to effectively scale to hundreds of petabytes.

Carlos Maltzahn photo

At the Santa Cruz Storage Systems Research Center (SSRC) we have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph maximizes the separation between data and metadata management by replacing allocation tables with a pseudo-random data distribution function (CRUSH) designed for heterogeneous and dynamic clusters of unreliable OSDs. We leverage device intelligence by distributing data replication, failure detection and recovery to semi-autonomous OSDs running a specialized local object file system. A dynamic distributed metadata cluster provides extremely efficient metadata management and seamlessly adapts to a wide range of general purpose and scientific computing file system workloads. Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata management, supporting more than 250,000 metadata operations per second.

The talk ends with a quick overview of what else is going on at the SSRC.

Dr. Carlos Maltzahn is an Assistant Research Computer Scientist at the Computer Science Department of the Jack Baskin School of Engineering and a faculty member of the Storage Systems Research Center both at the University of California at Santa Cruz. He is also Executive Director of the UCSC/Los Alamos Institute for Scalable Scientific Data Management. Dr. Maltzahn's current research interests include scalable file system data and metadata management, very long-term preservation, network intermediaries, machine learning, information retrieval, and cooperation dynamics. Dr. Maltzahn joined UC Santa Cruz in January 2005 after five years at Network Appliance. He received his PhD in Computer Science from the University of Colorado at Boulder in 1999, his MS in Computer Science in 1997, and his Univ. Diplom Informatik from the University of Passau, Germany in 1991.

Hosted by Gregory Grudic.

The Department holds colloquia throughout the Fall and Spring semesters. These colloquia, open to the public, are typically held on Thursday afternoons, but sometimes occur at other times as well. If you would like to receive email notification of upcoming colloquia, subscribe to our Colloquia Mailing List. If you would like to schedule a colloquium, see Colloquium Scheduling.

Sign language interpreters are available upon request. Please contact Stephanie Morris at least five days prior to the colloquium.

See also:
Department of Computer Science
College of Engineering and Applied Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
Send email to

Engineering Center Office Tower
ECOT 717
FAX +1-303-492-2844
XHTML 1.0/CSS2 ©2012 Regents of the University of Colorado
Privacy · Legal · Trademarks
May 5, 2012 (13:29)