NIPS 2004 Workshop


Calibration and Probabilistic Prediction in Supervised Learning


Friday December 17, 2004





Rich Caruana, Cornell University, Ithaca, NY, USA

Greg Grudic, University of Colorado, Boulder, CO, USA



Workshop Description:


Calibration refers to how accurately the probabilities predicted by a model correspond to empirical observations.  For example, if a model predicts that 1000 cases have predicted probability p = 0.2 of being positive class, the model is said to be well calibrated if about 200 of the 1000 cases are in the positive class.  A model is well calibrated if this is true for all predicted values of p. Calibrated probabilistic models have important applications in fields ranging from medicine to finance to particle physics to robotics.  Despite the recent explosion of interest in probabilistic models in machine learning, there has been little work in assessing the quality of the probabilistic predictions models make.  Measuring model calibration is challenging, and classification models can have high accuracy or ROC area, but be poorly calibrated.  In fact, learning methods such as boosting and other max margin methods that perform well on many other measures reduce calibration as a consequence of maximizing the margin. This workshop will focus on questions such as: When is calibration important?  How should we measure calibration?  What learning methods yield good (or bad) calibration?  Is it true that graphical models often yield poor calibration? Can models that have excellent performance on other metrics, but poor calibration, be calibrated so that they predict good probabilities?




Call for Submissions:


Submissions on any topic appropriate for the workshop are encouraged. Submissions should be abstracts of 1 to 2 pages (however, full papers will also be considered). Submissions should be in PDF or PS format and emailed by Nov. 1 to


Some measures of probability are listed in loss-functions.pdf. You are free to choose any of these or others you feel appropriate.



Workshop Challenge:

There is an Evaluating Predictive Uncertainty Challenge that will be part of this workshop. It is being run by Joaquin Quiñonero Candela, Carl Edward Rasmussen, and Yoshua Bengio. See


Link to Tentative Schedule



Current List of Speakers:


1.      Sham Kakade and Dean Foster, University of Pennsylvania

2.      Pedro M Domingos, University of Washington

3.      John Platt, Microsoft Research

4.      Charles Elkan, University of California, San Diego

5.      Neil Lawrence, University of Sheffield

6.      Carl Edward Rasmussen and Joaquin Quiñonero Candela, Max Planck Institute for Biological Cybernetics

7.      John Langford, Toyota Technical Institute at Chicago, and Bianca Zadrozny, IBM T.J. Watson Research Center

8.      Caren Marzban, University of Washington

9.      Rich Caruana and Alex Niculescu, Cornell University

10.  Greg Grudic, University of Colorado at Boulder

11.  Grigoris Karakoulas, University of Toronto

12.  Paul N. Bennett, CMU

13.  Stefan Ruping, Dortmund University