Return to Past and Current Projects

Active calibration

Modern deep networks suffer from poor confidence calibration: even though model accuracy is high, the models' class posteriors do not reflect the likelihood of correctness (Guo, Pleiss, Sun, & Weinberger, 2017). Many techniques have been developed for emph{post hoc} calibration of machine learning and statistical models, including Platt scaling (Platt, 1999), histogram binning (Zadrozny & Elkan, 2001), isotonic regression (Zadrozny & Elkan, 2002), and Bayesian binning into quantiles (Naeini, Cooper, & Hauskrecht, 2015). In contrast to these post hoc methods, we introduce a calibration loss that regularizes a neural net as it is trained. We perform histogram binning by confidence for examples in a training minibatch, and the calibration loss penalizes a discrepancy between predicted and actual accuracy in each bin. This loss is unusual in that examples binned by output similarity, not---as ordinarily one would expect for a smoothing regularizer---by input similarity.

With one bin per example, our calibration loss becomes equivalent to the primary objective of maxmizing classification performance. With a single bin for all examples, our calibration loss reduces to a regularizer that penalizes confident output distributions (Pereyra, Tucker, Chorowski, Kaiser, & Hinton, 2017). With an intermediate number of bins, our calibration loss is able to use statistics of model confidence to better match predicted and actual accuracy. Selecting the appropriate number of bins and appropriate use of a validation set will be essential for ensuring that the approach does not produce degenerate solutions (e.g., situations where an overparameterized model learns a training set perfectly, leading to high confidence responses that are not penalized.)


Brennen McConnell (Computer Science, Colorado)