Wolfram ResearchPRODUCTSPURCHASEFOR USERSCOMPANYOUR SITES

 Documentation /  Neural Networks /  Training Feedforward and Radial Basis Function Networks /  Regularization and Stopped Search /

Stopped SearchIntroduction

7.5.3 Example

In this small example you will see how stopped search and regularization can be used to handle the bias-variance tradeoff. The example is in a one-dimensional space so that you can look at the function. In Section 8.2.3, Bias-Variance Tradeoff: Avoiding Overfitting, a larger example is given.

Read in the Neural Networks package.

In[1]:=

Some additional standard add-on packages are needed in the example. Load them.

In[2]:=

To generate data the true function has to be defined. It is called trueFunction here and you can change it and repeat the calculations to obtain several different examples. You can also modify the number of data samples to be generated, the noise level on the data, and the number of hidden neurons in the model.

Generate noisy data and look at it.

In[4]:=

Look at the data and the true function.

In[8]:=

To apply stopped search you need validation data. Thus the available data is divided into two sets, training data and validation data.

Divide the data into training data and validation data.

In[9]:=

The default initialization of FF and RBF networks fits the linear parameters of the network using the least-squares algorithms, as described in Section 5.1.1, InitializeFeedForwardNet and Section 6.1.1, InitializeRBFNet. If the network is overparameterized, this may lead to very large values of the linear parameters. The large values of the linear parameters can then cause problems in the training, especially if you want to use regularization and stopped search. There are two straightforward ways to handle this. The first one is to use regularization in the initialization, which keeps the parameter values smaller. The second way, which is used here, is to choose RandomInitializationRuleLinearParameters so that the least-squares step is skipped and the linear parameters are chosen randomly.

Initialize an FF network.

In[13]:=

Out[13]=

Look at the initialized network, the true function, and the training data.

In[14]:=

It is now time to train the network. Validation data is submitted so that stopped search can be applied. If you have not set CriterionLog to False, the value of the criterion for training data and RMSE for validation data are written out during the training process. At the end of the training process, a message is given indicating at which iteration the RMSE reaches the minimum for the validation data used. It is the parameters at that iteration that are returned and define the network model. If CriterionPlot is not set to False you also get a plot at the end of the training showing the decrease of for training data and RMSE for validation data.

The separable algorithm, which is described in Section 7.6, Separable Training, fits the linear parameters in each step of the iterative training with the least-squares algorithm. Hence, for a reason similar to that of initializing the network without using least-squares for the linear parameters, it might be better to carry out the training without the separable algorithm. In this way extremely large parameter values are avoided. This is done by setting SeparableRuleFalse.

Train the network.

In[15]:=

The obtained function estimate using stopped search can now be plotted together with the true function and the training data.

Plot the obtained estimate.

In[16]:=

If no validation data had been submitted you would have received the parameters at the final iterations in the model. These parameters can be extracted from the training record and put into the network model. In that way, you can compare the result obtained with stopped search shown in the plot with the result you would have received without stopped search.

Put in the parameters at the final iteration.

In[17]:=

Look at the estimate without using stopped search.

In[19]:=

Compare this with the earlier plot where stopped search was used.

Consider now regularization instead of stopped search. As explained already, the linear parameters of an initialized network might become very large when using the default initialization due to the least-squares step. This may cause problems when regularization is applied, since the regularization term of the criterion dominates if the parameters are extremely large. As mentioned, there are two ways to handle this: using regularization also in the initialization, or skipping the least-square step.

For the same reason, problems may also arise when using the separable algorithm together with regularization. To avoid that, you can set SeparableRuleFalse in the training. You can supply validation data as before. By inspecting the RMSE criterion on the validation data you can see if the regularization parameter is of appropriate size. Too small of a value gives a validation error that increases toward the end of the training.

Train the network using regularization.

In[20]:=

Look at the estimate obtained using regularization.

In[21]:=

Compare the result with the ones obtained using stopped search and just normal learning.

You can modify the design parameters and repeat the example. Try networks of different sizes, and with several layers. Try an RBF network.

Stopped SearchIntroduction


Any questions about topics on this page? Click here to get an individual response.Buy NowMore Information



 © 2009 Wolfram Research, Inc.  Terms of Use  Privacy Policy |
Sign up for our newsletter: