I am building a research group on efficient systems design for managing and exploring massive amounts of digital data. Current large-scale data sets we consider include online social communities, Web Data in general, multimedia data (audio, video, digital photos), environmental sensor data, and data in specific application domains such as bioinformatics, healthcare, and Earth sciences. Our main research topics are search systems, data management, data mining, distributed systems, and online social communities.
Research assistant positions (for Ph.D. students) and postdoc positions may be available. Contact me if you are interested in joining the group or working on a research project in this area.
This project investigates research issues in searching, clustering, classification, and management for feature-rich, non-text data types such as audio, video, images, microarray gene expression data, and 3D shape models. Current research topics include sketch construction techniques, efficient filtering and indexing methods, similarity search of multiple data types, and toolkit for similarity search. For more information, please go to the CASS project website.
This project investigates more scalable alternatives to Gnutella-like peer-to-peer networks, focusing on search method and replication strategy. We have also studied how network heterogeneity can be utilized to provide a more scalable system.
We have designed and implemented a file server using Boxwood's distributed B-tree based storage abstraction. This work was conducted at Microsoft Research Silicon Valley Lab.
We have designed and implemented a fully distributed content-addressable storage system, focusing on high performance, fault tolerance, and scalability.
We have designed and implemented a network performance monitor for the Google web search engine. Based on the monitoring results, we have proposed several techniques for optimizing network performance. This work was conducted at Google Inc.
We have designed and implemented a self-organized network, focusing on architecture design, routing protocol, QoS, and performance analysis.