Department of Computer Science, Carnegie-Mellon University

3/20/2000

10:00am-11:15am

How to learn useful representations of multidimensional data, that can be used in a variety of tasks, some of them unspecified at the time of learning? This problem is called unsupervised learning and probabilistic models have proven to be particularly successful at it. However, learning and using models of multi-dimensional domains raises specific problems - termed the curse of dimensionality - that are not encountered in the univariate case. This talk shows how exploiting the computational properties of a simple probability model, the tree, leads to efficient, elegant and powerful algorithms for learning in multidimensional domains.

The tree is distinguished among graphical models by its outstanding computational properties. I show how to combine trees into more powerful models, called mixtures of trees, and how these can be learned efficiently by a method based on the Maximum Spanning Tree and the EM algorithms. The basic tree learning algorithm is quadratic in the dimension of the data. I demonstrate that for sparse data it can be transformed into an algorithm that is subquadratic and that achieves speedup factors of up to a thousand. Experiments demonstrate the performance of trees and mixtures in classification and density estimation tasks.

No prior knowledge of graphical probability models is necessary to follow this talk.

*Hosted by Clayton Lewis.*

Department of Computer Science

University of Colorado Boulder

Boulder, CO 80309-0430 USA

webmaster@cs.colorado.edu

University of Colorado Boulder

Boulder, CO 80309-0430 USA

webmaster@cs.colorado.edu