Colloquium - Craven

Combining Relational and Statistical Methods for Information Extraction
Carnegie-Mellon University

For many applications, information in text form remains a greatly underutilized resource. In a pair of related projects at CMU, I have been developing new methods that enable information represented in text and hypertext sources to be used as if it were in a structured representation, such as a knowledge base or a database. In the WebKB project we are developing methods for automatically constructing knowledge bases by extracting information from the Web. In the BioKB project, we are developing methods for extracting knowledge-base facts from on-line biomedical text sources, such as MEDLINE.

One focus of my research has been to develop a new learning approach that combines a relational learning algorithm with a statistical text classification method. This algorithm often induces more accurate classifiers than either a purely statistical method or a purely relational one. I will describe this approach and discuss its application to extracting information from both the Web and biomedical abstracts.

Hosted by James Martin.

