The gradient descent method may not be efficient because it could get into the zigzag pattern and repeat the same search directions many times. This problem is avoided in the conjugate gradient (CG) method, which does not repeat any previous search direction and converge in iterations. The CG method is a significant improvement in comparison to the gradient descent method.
We will first assume the function to be minimized is quadratic:
At the solution where is minimized, we have . We see that the minimization of a quadratic function is equivalent to solving a linear equation systems for given a symmetric positive definite matrix and a vector . The CG method considered below can therefore be used for solving both problems. Later we will relax the condition for and consider using CG to minimize non-quadratic functions.
Before discussing the CG algorithm, we first consider the concept of conjugate
vectors, which is the key to the CG method. A set of vectors
The basic strategy of the CG method is for the iteration to follow a search
path composed of segments each along one of the mutually conjugate
search directions. After the nth iteration that traversed along the search
direction to arrive at , the search direction
in the next iteration will be A-conjugate to the previous one
. By doing so, the total error at the initial guess
Now we consider specifically the CG algorithm with the following general
To see this, we premultiply both sides of the initial error equation
The specific A-conjugate search directions
for any can be constructed
by the Gram-Schmidt process
on any set of independent vectors
which can be used as the basis to span the space. Specifically, starting from
, we construct each of the subsequent for
based on , with all of its components along the previous
In particular, we could construct the A-orthogonal search directions
) based on the gradient directions
. The benefit of choosing this particular set of
directions is that the computational cost is much reduced as we will see
with on both sides of
In summary here is the conjugate gradient algorithm (note ):
The algorithm above assumes the objective function to be quadratic with known . However, when is not quadratic and is not available, the algorithm can be modified so that it does not depend on , by the following two methods.
In the figure below, the conjugate gradient method is compared with the gradient descent method for the case of . We see that the first search direction is the same for both methods. However, the next direction is A-orthogonal to , same as the next error , different from the search direction in gradient descent method. The conjugate gradient method finds the solution in steps, while the gradient descent method has to go through many more steps all orthogonal to each other before it finds the solution.
Now we relax the requirement that the function be quadratic, but we
still assume that it can be approximated as a quadratic function near its minimum.
In this case, the update of the search direction, or specifically the coefficient
, which is derived based on the assumption that is
quadratic, may no longer be valid. Some alternative formulas for ,
can be used:
Example To compare the conjugate method and the gradient descent method,
consider a very simple 2-D quadratic function
However, as expected, the conjugate gradient method takes exactly steps from any initial guess to reach at the solution, as shown below.
For an example of
|0||1.000000, 2.000000, 3.000000||4.500000e+01|
|1||-0.734716, -0.106441, 1.265284||2.809225e+00|
|2||0.123437, -0.209498, 0.136074||3.584736e-02|
|3||-0.000000, 0.000000, 0.000000||3.949119e-31|
Conjugate gradient method used for solving linear equation systems:
As discussed before, if is the solution that minimizes the quadratic function , with being symmetric and positive definite, it also satisfies . In other words, the optimization problem is equivalent to the problem of solving the linear system , both can be solved by the conjugate gradient method.
Now consider solving the linear system
. Let () be a set of
A-orthogonal vectors satisfying
The solution of the equation
represented by these vectors as
One application of the conjugate gradient method is to solve the normal equation
to find the least-square solution of an over-constrained equation system
, where the coefficient matrix is by
of rank , i.e., . As discussed previously, the normal equation of this