|
8.2.4 Bias-Variance Tradeoff - Avoiding Overfitting
As described in Section 7.5, Regularization and Stopped Search, it is critical to find the appropriate type of model and the appropriate number of parameters of the model.
In this example three ways to avoid overfitting are demonstrated: choosing a network with a sufficiently low number of neurons, using stopped search for the minimum, and applying regularization.
The data from the hydraulic actuator from the previous example is used to demonstrate these alternative options. You may refer to the previous example if you want an introduction to this data set.
Load the Neural Networks package and the data.
In[1]:=
In[2]:=
The first half of the data set is used for identification and the second half for validation of the model.
In[3]:=
Train a nonlinear neural ARX model with many neurons on the data.
Train an FF network on the hydraulic actuator.
In[7]:=

Depending on the initialization you end up in different local minima. If the result is not satisfactory you may repeat the training, which will use a new initialization.
Evaluate the model on validation data using a four-step prediction.
In[8]:=

The result of the prediction depends on which minimum the training converged to, but usually the result is worse than that of a linear model. You may try a linear model for comparison.
Estimate a linear model and display the four-step prediction.
In[9]:=

The reason why the nonlinear model was worse than the linear model is that it had more degrees of freedom than necessary; that is, it used more parameters than necessary. In contrast, the linear model used fewer parameters than necessary. By choosing somewhere between 0 and 8 hidden neurons, it might be possible to find a better model. This is the first way to handle the bias-variance tradeoff.
Estimate and predict a model with four hidden neurons.
In[11]:=

In[12]:=

Is the result better than the linear model and the model with eight neurons? Try other numbers of neurons and repeat the training.
You can also submit the validation data to the training algorithm. The criterion is then evaluated after each iteration. As described in Section 7.5, Regularization and Stopped Search, the most important parameters are adapted in the beginning of the training, and the least important at the end. The performance of the network model during the training is illustrated by the plot of the criterion evaluated on the validation data, which is shown at the end of the training. If the model starts to become worse after some number of training iterations then the model is overtrained.
Train a large network but supply the validation data.
In[13]:=


Is there any overtraining? Usually there is when you use such a large network, but, depending on initialization, overtraining may not be an issue.
If you submit validation data, the desired model is not necessarily the one obtained at the end of the training process. The desired model is the one that gives the best match to the validation data, which could exist at an intermediate iteration in the training process. Hence, by supplying validation data you can do the bias-variance tradeoff by stopped search; that is, stopping the training at the iteration where the prediction best matches the validation data.
Predict with the model obtained by stopped search.
In[14]:=

When validation data is submitted in the training, you automatically obtain a stopped search model. If you instead want the model at the last iteration, it is possible to get it by using the training record. It contains a list of the parameters after each training iteration.
Check the storing format of the training record.
In[15]:=
Out[15]=
Note that the value of ReportFrequency indicates the iteration frequency with which parameter values are logged.
Extract the parameters versus training iterations and check the length of the parameter list.
In[16]:=
Out[17]=
The length of the parameter log equals the number of iterations, including one for the initial estimate. The model at the last iteration can now be compared to the one obtained with the preceding stopped search technique. This can be done in the following way. First check how the information is stored in the model.
In[18]:=
Out[18]=
Create a new model with the same structure as the previous one and insert the last parameter values from the training record.
In[19]:=
Predict using the final model.
In[21]:=

Now, compare the performance of this model with that of the stopped search model.
Also if you do not want to use the stopped search feature, it might be interesting to submit validation data in the training. If the performance measured on validation data increases toward the end of the training, then this indicates that the chosen model has too much flexibility, and you should choose a neural network with fewer neurons (or use stopped search).
The third way to handle the bias-variance tradeoff is to use a large model but to minimize a regularized criterion, as described in Section 7.5, Regularization and Stopped Search.
Train a neural network using regularization.
In[22]:=

Evaluate the regularized network model.
In[23]:=

How does the regularized model perform compared to the other two ways of handling the bias variance tradeoff?
In this example you have seen three ways to handle the bias-variance tradeoff: (1) use a sparse model, which does not have more parameters than necessary; (2) use a large model in connection with the validation data to apply stopped search, so that only a subset of the parameters are used; and (3) use a large model with regularization so that only a subset of the parameters are used.
|