skip to main content
Department of Computer Science University of Colorado Boulder
cu: home | engineering | mycuinfo | about | cu a-z | search cu | contact cu cs: about | calendar | directory | catalog | schedules | mobile | contact cs
home · events · colloquia · 1995-1996 · 

Colloquium - Pedersen

ECCR 2-28

A Statistician's View of Information Retrieval
Jan O. Pedersen
Xerox PARC

Information Retrieval is the task of identifying documents relevant to an information need. This is hard because "relevance" is difficult to objectively assess and an information need may include context that is not explicitly represented. Nonetheless, it is possible to build useful information access systems that perform remarkably well using very simple techniques. In fact, it is notoriously difficult to improve on their performance.

I will discuss why this might be the case by analyzing a few classical information retrieval tasks as problems in statistical classification. It will emerge that the high-dimensionality of the feature space will defeat naive attempts to improve performance. However, careful dimensionality reduction paired with appropriate classification technology will yield promising results.

Jan O. Pedersen is a statistician specializing on the quantitative analysis of text for the purposes of information access. His most recent work has focused on the development of fast clustering algorithms as applied to the organization of large document collections and the design of a software architecture for information access. His other interests have included text categorization, thesaurus induction, and document filtering and routing. Jan Pedersen has degrees from Princeton University (AB) and Stanford University(PhD). He joined Xerox Corp. in 1986 and is currently manager of the Quantitative Content Analysis Area of the Information Sciences and Technology Laboratory, Xerox PARC.

Refreshments will be served immediately before the talk at 3:30pm.
Hosted by Andreas Weigend.

The Department holds colloquia throughout the Fall and Spring semesters. These colloquia, open to the public, are typically held on Thursday afternoons, but sometimes occur at other times as well. If you would like to receive email notification of upcoming colloquia, subscribe to our Colloquia Mailing List. If you would like to schedule a colloquium, see Colloquium Scheduling.

Sign language interpreters are available upon request. Please contact Stephanie Morris at least five days prior to the colloquium.

See also:
Department of Computer Science
College of Engineering and Applied Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
Send email to

Engineering Center Office Tower
ECOT 717
FAX +1-303-492-2844
XHTML 1.0/CSS2 ©2012 Regents of the University of Colorado
Privacy · Legal · Trademarks
May 5, 2012 (13:29)