Department of Computer Science

10/21/2004

3:30pm-4:30pm

Two desirable characteristics for regression and classification algorithms are 1) that they can efficiently and accurately build nonlinear models in very large problem domains, and 2) that they give reliable estimates of model accuracy. These are not commonly found characteristics in today's state of the art algorithms. In this talk I will argue that the Minimax Probability Machine (MPM) framework is a suitable theoretical basis for deriving algorithms that have both of these characteristics.

The Minimax Probability Machine Classification (MPMC) algorithm builds classifiers by minimizing the maximum probability of misclassification, and gives direct estimates of this bound on future data. The only assumption that MPMC makes is that good estimates of the means and covariance matrices of the class distributions exist. This makes MPMC an attractive framework for classification, and published experimental results suggest that MPMC algorithms can produce classifiers that have state of the art accuracy. However, the original MPMC formulation is computationally expensive and requires extensive cross validation experiments for model selection.

In the first part of this talk I will describe two extensions to the MPMC algorithm that make the framework computationally feasible for very large problem domains. The first is the Sparse MPMC algorithm for generating sparse kernel classifiers; and the second is an MPMC algorithm for generating a polynomial cascade model. Both algorithms efficiently perform model selection, have state of the art classification performance, and can give accurate bounds on the classification accuracy on future data.

In the second part of this talk I will show how the MPMC formulation can be extended to regression. Within the Minimax Probability Machine framework, the regression problem is posed as maximizing the minimum probability that future points generated from some true regression surface, will lie within epsilon of our regression model. This MPMR algorithm gives a direct estimate of this probabilistic bound, and experimental evidence demonstrates the accuracy of this bound and the efficacy of the regression models.

Department of Computer Science

University of Colorado Boulder

Boulder, CO 80309-0430 USA

webmaster@cs.colorado.edu

University of Colorado Boulder

Boulder, CO 80309-0430 USA

webmaster@cs.colorado.edu