home · mobile · calendar · colloquia · 2007-2008 · 

Colloquium - Lv

Similarity Search for Large-Scale Feature-Rich Data
Stony Brook University

Content-based similarity search for feature-rich data (such as digital photos, audio, video, and scientific sensor data) is a difficult problem due to the high dimensionality and usually massive amounts of data. The main challenge is to achieve high-quality similarity search with high speed and low space usage.

This talk presents several techniques to address the problem of building efficient similarity search systems for large-scale feature-rich data. The first is a sketch construction algorithm for compact metadata representation, which can typically reduce the metadata size by an order of magnitude with minimal impact on search quality. The second is a multi-probe locality sensitive hashing (LSH) technique for indexing high-dimensional data, which substantially improves upon previous methods in both space and time efficiency. We have also developed Ferret, a general-purpose toolkit for building efficient similarity search systems. The Ferret toolkit has been successfully used to build similarity search systems for digital images, speech recordings, video, 3D shape models, and microarray gene expression data.

Qin (Christine) Lv is an Assistant Professor in the Computer Science Department, Stony Brook University (SUNY). She received her BE degree from Tsinghua University in 2000 and PhD degree in Computer Science from Princeton University in 2006. Lv's primary research interest is to develop efficient systems for managing and exploring massive amounts of digital data. Rooted in systems, her research also interacts with the areas of algorithm design, data mining, machine learning, and specific application domains such as multimedia, bio-informatics, healthcare, and scientific computing.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)