home · mobile · calendar · defenses · 2006-2007 · 

Thesis Defense - Strohmann

Sparse Models for Supervised Learning -- Predicting with Few Examples and Few Features
Computer Science PhD Candidate

Machine learning methods are being used in application domains as diverse as robotics (for real time computer vision), natural language processing (for document classification and translation), and bioinformatics (for gene expression analysis and protein structure prediction). One common characteristics of these applications is that they all involve large amounts of data. The data is typically organized as a set of examples, where each example has a set of features and a label. Supervised machine learning is the study of automatically building a model from a set of labeled training data. For data with categorical labels we build a classification model, and for data with continuous labels we build a regression model. After building a model it is then used on unseen data to predict labels from a set of features.

Many algorithms in the field of sparse learning find a model that can be evaluated using only a small number of training examples, that is they are "predicting with few examples". Using few training examples within a model has two advantages: It speeds up the prediction process and it decreases the chance of overfitting (building a model that does very well on training data but does poorly on unseen data). "Predicting with few features" has been studied extensively in the field of feature selection. The idea of feature selection is to choose only a (small) subset of the available features for predicting. Feature selection is typically used in domains where the data contains many noisy features, i.e. features which are not relevant for predicting labels.

This thesis combines "predicting with few examples" and "predicting with few features" into a single framework. The goal is to produce accurate models which use a small number of examples and a small number of features for prediction. The focus of the thesis lies on models which are linear combinations of local basis functions where each basis function models the data in the local neighborhood around a training example. In addition to seeking a model with few basis functions, we want the basis functions themselves to be sparse. A sparse basis function is characterized by using only a few features which are relevant in the local neighbourhood. Empirical and theoretical results show the efficacy of our framework for building highly sparse models.

Committee: Gregory Grudic, Assistant Professor (Chair)
Michael Mozer, Professor
James Martin, Associate Professor
Richard Byrd, Professor
Lyle Ungar, University of Pennsylvania
Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:20)