9/30/1996 9:30am-11:30am ECOT 831
|
Nonparametric Selection of Input Variables for Connectionist Learning
Computer Science PhD Candidate
When many possible input variables to a statistical model exist, removing
unimportant inputs can improve the model's performance significantly. A new
method for selecting input variables is proposed. Components for the proposed
method include:
Mutual information as a relevance measure
Kernel density estimation for estimating probabilities
Forward selection as an input variable search method
Analysis of mutual information shows that it is natural measure of input
variable relevance. It is a more general measure of input variable relevance
than expected conditional variance. Under certain conditions, the two measures
order the relevance of input variable subsets in precisely the same manner,
but these conditions do not generally hold. An unbiased approximation to mutual
information exists, but it is unbiased only if the underlying probabilities
are exact.
Analysis of kernel density estimation shows that the accuracy of mutual
information estimates depends directly on how densely populated the points in
the data set are. However, for a range of explored problems, the relative
ordering of mutual information estimates remains correct, despite inaccuracies
in individual estimates.
Analysis of forward selection explores the amount of data required to select a
certain number of relevant input variables. It is shown that in order to select
a certain number of relevant input variables, the amount of required data
increases roughly exponentially as more relevant input variables are
considered. It is also shown that the chances of forward selection ending up
in a local minimum are reduced by bootstrapping the data.
Finally, the method is compared to two connectionist methods for input variable
selection: Sensitivity Based Pruning and Automatic Relevance Determination. It
is shown that the new method outperforms these two when the number of
independent, candidate input variables is large. However, the method requires
the number of relevant input variables to be relatively small. These results
are confirmed on a number of real world prediction problems, including the
prediction of energy consumption in a building, the prediction of heart rate in
a patient with sleep apnea, and the prediction of wind force in a wind turbine.
|