home · mobile · calendar · bactac · 2008-2009 · 

BACTAC - Jiang

Effective Crawler for Online Social Networks
Yifei Jiang
Graduate Student

Online social communities are among the most popular sites on the Internet, which provides an opportunity to study these online social websites. In this project we study the online social community crawling problem, which is a fundamental step in many research areas of online social networks, such as studying the characteristics of online social networks, search engines and data mining. Compared to a regular website, there are some challenges for crawling online social networks.

  1. Links structure between pages in online social websites is much more complicated than other websites, which causes a large number of duplicate pages.

  2. A lot of duplicate content is listed in online social networks. For example, users could be listed both by friend list and community numbers.

  3. Some pages in online social websites do not include any useful information, such as the user register page and comments inputting pages which only provide input forms for users.

In our project, we try to propose and implement an online social network crawler to avoid crawling above duplicate and uninformative web pages to improve the crawling efficiency.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:24)