Wolfram ResearchPRODUCTSPURCHASEFOR USERSCOMPANYOUR SITES

 Documentation /  Neural Networks /  Training Feedforward and Radial Basis Function Networks /  Regularization and Stopped Search /

TroubleshootingRegularization

7.5 Regularization and Stopped Search

A central issue in choosing a neural network model for a given problem is selecting the level of its structural complexity that best suits the data that it must accommodate. If the model contains too many parameters, it will approximate not only the data but also the noise in the data. Then the model is overfitting the data. The misfit induced by noise is called the variance contribution to the model misfit, which increases with the number of parameters of the model. On the other hand, a model that contains too few parameters will not be flexible enough to approximate important features in the data. This gives a bias contribution to the misfit due to lack of flexibility. Since the flexibility increases with the number of parameters, the bias error decreases when the model size increases. Deciding on the correct amount of flexibility in a neural network model is therefore a tradeoff between these two sources of the misfit. This is called the bias-variance tradeoff.

Overfitting may be avoided by restricting the flexibility of the neural model in some way. For neural networks, flexibility is specified by the number of hidden neurons.

The Neural Networks package offers three ways to handle the bias-variance tradeoff and all three rely on the use of a second, independent data set, the so-called validation data, which has not been used to train the model.

FilledSmallCircle The traditional way to carry out the bias-variance tradeoff is to try different candidate neural networks, with different numbers of hidden neurons. The performance of the trained networks can then be computed on the validation data, and the best network is selected.

FilledSmallCircle By specifying a regularization parameter larger than zero, a regularized performance index is minimized instead of the original MSE. This type of regularization is often called weight decay in connection with neural networks.

FilledSmallCircle By submitting the validation data in the call to NeuralFit you apply stopped search. The MSE is minimized with respect to the training data, but the obtained parameter estimate is the one that gave the best performance on the validation data at some intermediate iteration during the training.

The last two of these techniques make effective use of only a subset of the parameters in the network. Hence, the number of efficient parameters becomes lower then the nominal number of parameters. This is described in the following two sections.

TroubleshootingRegularization


Any questions about topics on this page? Click here to get an individual response.Buy NowMore Information



 © 2009 Wolfram Research, Inc.  Terms of Use  Privacy Policy |
Sign up for our newsletter: