Overview
Project Team

Scaling Insight into Science: Assessing the value and effectiveness of machine assisted classification within a statistical system

Project funded by the National Science Foundation (NCSE-1422492)
PI: Jordan Boyd-Graber

In collaboration with James Evans and Julia Lane.

Overview

Labeling schemes such as the MESH ontology help users understand large corpora of specialized text. However, despite the demonstrated utility of such techniques, they represent an uncertain value proposition, as they require huge investments of resources to both create and apply the labels.

For the latter question, automatic labeling of text data using improves the value proposition by reducing cost. However, the process for creating a broadly applicable, consistent, and generalizable label set and then applying them to a dataset is long and difficult.

To solve the problem of label creation, we present ALTO (Active Learning from Topic Overviews), a user interactive tool for document labeling that uses topic models to help users assign appropriate labels to documents. We show that annotators can more quickly label (higher value, lower cost) a document collection given a topic modeling overview and that these efforts result in a more useful (in our experiments, higher purity) system.

<< back to top

Project Team

Jordan Boyd-Graber Jordan Boyd-Graber
Assistant Professor, Computer Science (Colorado)
Forough Poursabzi Forough Poursabzi
PhD Student, Computer Science (Colorado)

<< back to top

Acknowledgments

This work is supported by the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the researchers and do not necessarily reflect the views of the National Science Foundation.