Documentation
Mathematica
Built-in Functions
Advanced Documentation
Optimization
Unconstrained Optimization

Step Control
Even with Newton methods where the local model is based on the actual Hessian, unless you are close to a root or minimum, the model step may not bring you any closer to the solution. A simple example is given by the following problem.
This shows a simple example for root finding with step control disabled where the iteration alternates between two points and so does not converge. Note: on some platforms, you may see convergence. This is due to slight variations in machine number arithmetic which may be sufficient to break the oscillation.
In[1]:=


Out[1]=
This shows the same example problem with step control enabled. Since the first evaluation point has not reduced the size of the function, the line search restricts the step and so the iteration converges to the solution.
In[2]:=

Out[2]=
A good step size control algorithm will prevent repetition or escape from areas near roots or minima from happening. At the same time, however, when steps based on the model function are appropriate, the step size control algorithm should not restrict them, otherwise the convergence rate of the algorithm would be compromised. Two commonly used step size control algorithms are line search and trust region methods. In a line search method, the model function gives a step direction, and a search is done along that direction to find an adequate point that will lead to convergence. In a trust region method, a distance in which the model function will be trusted is updated at each step. If the model step lies within that distance, it is used, otherwise, an approximate minimum for the model function on the boundary of the trust region is used. Generally the trust region methods are more robust, but they require more numerical linear algebra.
Both step control methods were developed originally with minimization in mind. However, they apply well to finding roots for nonlinear equations when used with a merit function. In Mathematica, the 2-norm merit function
is used.
Line Search Methods
A method like Newton's method chooses a step, but the validity of that step only goes so far as the Newton quadratic model for the function really reflects the function. The idea of a line search is to use the direction of the chosen step, but to control the length, by solving a one-dimensional problem of minimizing
(
)==f (
pk+xk)
where
is the search direction chosen from the position
. Note that
'(
)==
f (
pk+xk).pk
so if we can compute the gradient, we can effectively do a one-dimensional search with derivative.
Typically, an effective line search only looks towards
> 0 since a reasonable method should guarantee that the search direction is a descent direction, which can be expressed as
'(
) < 0.
It is typically not worth the effort to find an exact minimum of
since the search direction is rarely exactly the right direction. Usually it is enough to move closer.
One condition that measures progress is called the Armijo or sufficient decrease condition for a candidate
.
(
*)≤
(0) +
'(0), 0<
<1
Often with this condition, methods will converge, but for some methods, Armijo alone does not guarantee convergence for smooth functions. With the additional curvature condition,

'(
*)
≤

'(0)
, 0 <
≤
< 1,
many methods can be proven to converge for smooth functions. Together these conditions are known as the string Wolfe conditions. You can control the parameters
and
with the "DecreaseFactor" ->
and "CurvatureFactor" ->
options of LineSearch.
The default value for "CurvatureFactor" ->
is
== 0.9 except for Method -> "ConjugateGradient" where
= 0.1 is used since the algorithm typically works better with a closer to exact line search. The smaller
is the closer to exact the line search.
If you look at graphs showing iterative searches in two dimensions, you can see the evaluations spread out along the directions of the line searches. Typically, it only takes a few iterations to find a point satisfying the conditions. However, the line search is not always able to find a point that satisfies the conditions. Usually this is because there is insufficient precision to compute the points closely enough to satisfy the conditions, but it can also be caused by functions that are not completely smooth or vary extremely slowly in the neighborhood of a minimum.
In[3]:=

Out[3]=
This runs into problems because the real differences in the function are negligible compared to evaluation differences around the point, as can be seen from the plot.
In[4]:=

Sometimes it can help to subtract out the constants so that small changes in the function are more significant.
In[5]:=

Out[5]=
In this case, however, this only get slightly closer because the function is quite noisy near 0, as can be seen from the plot.
In[6]:=

Thus, to get closer to the correct value of zero, higher precision is required to compute the function more accurately.
For some problems, particularly where you may be starting far from a root or a local minimum, it may be desirable to restrict steps. With line searches, it is possible to do this by using the "MaxRelativeStepSize" method option. The default value picked for this is designed to keep searches from going wildly out of control, yet at the same time not prevent a search from reasonably large steps if appropriate.
This is an example of a problem where the Newton step is very large because the starting point is at a position where the Jacobian (derivative) is nearly singular. The step size is (not severely) limited by the option.
In[7]:=

Out[7]=
This shows the same example but with a more rigorous step size limitation, which finds the root near the starting condition.
In[8]:=

Out[8]=
Note that you need to be careful not to set the "MaxRelativeStepSize" option too small, or it will affect convergence, especially for minima and roots near zero.
The following table shows a summary of the options, which can be used to control line searches.

Method options for "StepControl" -> "LineSearch".
The following sections will describe the three line search algorithms implemented in Mathematica. Comparisons will be made using Rosenbrock's function.
This uses the unconstrained problems package to set up the classic Rosenbrock function, which has a narrow curved valley.
In[9]:=
Out[9]=
MoreThuente
The default line search used by FindMinimum, FindMaximum, and FindFit is one described by More and Thuente in [MT94]. It tries to find a point that satisfies both the decrease and curvature conditions by using bracketing and quadratic and cubic interpolation.
This shows the steps and evaluations done with Newton's method with the default line search parameters. Points with just red and green are where the function and gradient were evaluated in the line search, but the Wolfe conditions were not satisfied so as to take a step.
In[10]:=

Out[10]=
The points at which only the function and gradient were evaluated were the ones attempted in the line search phase that did not satisfy both conditions. Unless restricted by "MaxRelativeStepSize", the line search always starts with the full step length (
= 1), so that if the full (in this case Newton) step satisfies the line search criteria, it will be taken, ensuring full convergence rate close to a minimum.
Decreasing the curvature factor, which means that the line search ends nearer to the exact minimum, decreases the number of steps taken by Newton's method, but it ends up increasing the total number of function and gradient evaluations.
This shows the steps and evaluations done with Newton's method with a smaller than default curvature factor in the line search parameters. Points with just red and green are where the function and gradient were evaluated in the line search, but the Wolfe conditions were not satisfied so as to take a step.
In[11]:=

Out[11]=
This example demonstrates why a more exact line search is not necessarily better. When the line search takes the step to the right at the bottom of the narrow valley, the Newton step is based on moving along the valley without seeing its curvature (the curvature of the valley is beyond quadratic order), so the Newton steps end up being far too long, even through the direction is better. On the other hand, some methods, such as the conjugate gradient method, need a better line search to improve convergence.
Backtracking
This is a simple line search that starts from the given step size and backtracks towards a step size of 0, stopping when the sufficient decrease condition is met. In general with only backtracking, there is no guarantee that you can satisfy the curvature condition, even for nice functions, so the convergence properties of the methods are not assured. However, the backtracking line search also does not need to evaluate the gradient at each point, so if gradient evaluations are relatively expensive, this may be a good choice. It is used as the default line search in FindRoot because evaluating the gradient of the merit function involves computing the Jacobian, which is relatively expensive.
In[12]:=

Out[12]=
Each backtracking step is taken by doing a polynomial interpolation and finding the minimum point for the interpolant. This point
is used as long as it lies between
and
where
is the previous value of the parameter
and
. By default ,
and
, but they can be controlled by the method option "BacktrackFactors"
. If you give a single value for the factors, this sets
and no interpolation is used. The value 1/2 gives bisection.
In this example, the effect of the relatively large backtrack factor is quite apparent.
In[13]:=

Out[13]=

Method option for line search Method->"Backtracking".
Brent
This uses the derivative-free univariate method of Brent [Br02] for the line search. It attempts to find the minimum of
(
) to within tolerances, regardless of the decrease and curvature factors. In effect, it has two phases. First, it tries to bracket the root, then it uses Brent's combined interpolation/golden section method to find the minimum. The advantage of this line search is that it does not require, as the other two methods do, that the step be in a descent direction, since it will look both directions in an attempt to bracket the minimum. As such it is very appropriate for the derivative-free principal axis method. The downside of this line search is that it typically uses many function evaluations, so is usually less efficient than the other two methods.
This example shows the effect of using the Brent method for line search. Note that in the phase of bracketing the root, it may use negative values of
. Even though the number of Newton steps is relatively small in this example, the total number of function evaluations is much larger than for other line search methods.
In[14]:=

Out[14]=
Trust Region Methods
A trust region method has a region around the current search point, where quadratic model (1) for local minimization is "trusted" to be correct and steps are chosen to stay within this region. The size of the region is modified during the search, based on how well the model agrees with actual function evaluations.
Very typically, the trust region is taken to be an ellipse such that
≤
.
is a diagonal scaling (often taken from the diagonal of the approximate Hessian) and
is the trust region radius, which is updated at each step.
When the step based on the quadratic model alone lies within the trust region, then, assuming the function value get smaller, that step will be chosen. Thus, just as with line search methods, the step control does not interfere with the convergence of the algorithm near to a minimum where the quadratic model is good. When the step based on the quadratic model lies outside the trust region, a step just up to the boundary of the trust region is chosen, such that the step is an approximate minimizer of the quadratic model on the boundary of the trust region.
Once a step
is chosen, the function is evaluated at the new point, and the actual function value is checked against the value predicted by the quadratic model. What is actually computed is the ratio of actual to predicted reduction.
k = f(xk) - f(xk +pk) qk(0)-qk(pk) = actual reduction of fpredicted model reductionof f
If
is close to 1, then the quadratic model is quite a good predictor and the region can be increased in size. On the other hand, if
is too small, the region is decreased in size. When
is below a threshold,
, the step is rejected and recomputed. You can control this threshold with the method option "AcceptableStepRatio"->
. Typically the value of
is quite small to avoid rejecting steps that would be progress towards a minimum. However, if obtaining the quadratic model at a point is quite expensive (e.g., evaluating the Hessian takes a relatively long time), a larger value of
will reduce the number of Hessian evaluations, but it may increase the number of function evaluations.
To start the trust region algorithm, an initial radius
needs to be determined. By default Mathematica uses the size of the step based on the model (1) restricted by a fairly loose relative step size limit. However, in some cases, this may take you out of the region you are primarily interested in, so you can specify a starting radius
using the option "StartingScaledStepSize" ->
0
. The option contains Scaled in its name because the trust region radius works through the diagonal scaling
, so this is not an absolute step size.
This shows the steps and evaluations taken during a search for a local minimum of a function similar to Rosenbrock's function, using Newton's method with trust region step control.
In[15]:=

Out[15]=
The plot looks quite bad because the search has extended over such a large region that the fine structure of the function cannot really be seen on that scale.
This shows the steps and evaluations for the same function, but with a restricted initial trust region radius
. Here the search stays much closer to the initial condition and follows the narrow valley.
In[16]:=

Out[16]=
It is also possible to set a overall maximum bound for the trust region radius by using the option "MaxScaledStepSize" ->
so that for any step,
≤
.
Trust region methods can also have difficulties that functions which are not smooth due to problems with numerical roundoff in the function computation. When the function is not sufficiently smooth, the radius of the trust region will keep getting reduced. Eventually, it will get to the point at which it is effectively zero.
This gets the Freudenstein-Roth test problem from the unconstrained problems package in a form where it can be solved by FindMinimum.
In[17]:=
Out[17]=
This finds a local minimum for the function using the default method. The default method in this case is the (trust region) Levenberg-Marquardt method since the function is a sum of squares.
In[18]:=


Out[18]=
The message means that the size of the trust region has become effectively zero relative to the size of the search point, so steps taken would have negligible effect. Note: On some platforms, due to subtle differences in machine arithmetic, the message may not show up. This is because the reasons leading to the message have to do with numerical uncertainty, which can vary between different platforms.
This makes a plot of the function along the
-direction at the final point found.
In[19]:=

The plot along one direction makes it fairly clear why no more improvement is possible. Part of the reason the Levenberg-Marquardt method gets into trouble in this situation is that convergence is relatively slow because the residual is nonzero at the minimum. With Newton's method, the convergence is faster, and the full quadratic model allows for a better estimate of step size, so that FindMinimum can have more confidence that the default tolerances have been satisfied.
In[20]:=

Out[20]=
The following table summarizes the options for controlling trust region step control.

Method options for "StepControl"->"TrustRegion".