Similarity Search for Large-Scale Feature-Rich Data
Stony Brook University
Content-based similarity search for feature-rich data (such as digital photos,
audio, video, and scientific sensor data) is a difficult problem due to the
high dimensionality and usually massive amounts of data. The main challenge is
to achieve high-quality similarity search with high speed and low space usage.
This talk presents several techniques to address the problem of building
efficient similarity search systems for large-scale feature-rich data. The
first is a sketch construction algorithm for compact metadata representation,
which can typically reduce the metadata size by an order of magnitude with
minimal impact on search quality. The second is a multi-probe locality
sensitive hashing (LSH) technique for indexing high-dimensional data, which
substantially improves upon previous methods in both space and time efficiency.
We have also developed Ferret, a general-purpose toolkit for building
efficient similarity search systems. The Ferret toolkit has been
successfully used to build similarity search systems for digital images, speech
recordings, video, 3D shape models, and microarray gene expression data.
Qin (Christine) Lv
is an Assistant Professor in the Computer Science Department, Stony Brook
University (SUNY). She received her BE degree from Tsinghua University in 2000
and PhD degree in Computer Science from Princeton University in 2006.
Lv's primary research interest is to develop efficient systems for managing and
exploring massive amounts of digital data. Rooted in systems, her research also
interacts with the areas of algorithm design, data mining, machine learning,
and specific application domains such as multimedia, bio-informatics,
healthcare, and scientific computing.