next up previous
Next: Optimization Up: Solving Non-Linear Equations Previous: Newton-Raphson method (univariate)

Newton-Raphson method (multivariate)

Before discussing how to solve a multivariate systems, it is helpful to review the Taylor series expansion of an N-D function.

The methods discussed above for solving a 1-D equation $f(x)=0$ can be generalized for solving an N-D multivariate equation system:

\begin{displaymath}\left\{ \begin{array}{l}
f_1(x_1,\cdots,x_N)=f_1({\bf x})=0\...
...ots \\
f_N(x_1,\cdots,x_N)=f_N({\bf x})=0 \end{array}\right. \end{displaymath}

where we have defined

\begin{displaymath}{\bf x}=[x_1,\cdots,x_N]^T \end{displaymath}

We can further define a vector function

\begin{displaymath}{\bf f}({\bf x})=[f_1({\bf x}),\cdots,f_n({\bf x})]^T \end{displaymath}

so that the equation system above can be written as

\begin{displaymath}{\bf f}({\bf x})={\bf0} \end{displaymath}

When $N=2$, the solution of the equation above can be geometrically explained. The equation $f_i(x_1,x_2)=0$ represents contour curves in the $(x_1, x_2)$ plane that partition the plane into regions in which the function $f_1(x_1,x_2)$ takes either positive or negative values. The solutions that satisfy both equations are the intersections of the contour curves of both $f_1(x_1,x_2)$ and $f_2(x_1,x_2)$.

NewtonRaphson2a.png

Newton-Raphson method

The Newton-Raphson method can also be generalized for solving an N-D system ${\bf f}({\bf x})={\bf0}$. We first consider the Taylor expansions of the $N$ functions:

\begin{displaymath}f_i({\bf x}+\delta{\bf x})
=f_i({\bf x})+\sum_{j=1}^N\frac{\p...
...f x})}{\partial x_j}\delta x_j,
\;\;\;\;\;\;\;\;(i=1,\cdots,N) \end{displaymath}

These $N$ equations can be written in vector form:
$\displaystyle {\bf f}({\bf x}+\delta{\bf x})$ $\textstyle =$ $\displaystyle \left[\begin{array}{c}f_1({\bf x}+\delta{\bf x})  \vdots  f_N...
...ht]
\left[\begin{array}{c}\delta x_1  \vdots  \delta x_N\end{array}\right]$  
  $\textstyle =$ $\displaystyle {\bf f}({\bf x})+{\bf J}_{\bf f}({\bf x})\;\delta{\bf x}$  

where ${\bf J}_{\bf f}({\bf x})$ is an $N\times N$ Jacobian matrix defined over the function vector ${\bf f}({\bf x})$ defined as:

\begin{displaymath}
{\bf J}_{\bf f}({\bf x})=\frac{d}{d{\bf x}}{\bf f}({\bf x})...
...&\cdots&\frac{\partial f_N}{\partial x_N}
\end{array} \right]
\end{displaymath}

with the ijth element being $\partial f_i/\partial x_j$. Note that the ith row vector of ${\bf J}({\bf x})$ is the gradient vector of the ith function $f_i({\bf x})$:

\begin{displaymath}{\bf g}[f_i({\bf x})]=\left[\frac{\partial f_i({\bf x})}{\par...
...1},\cdots,
\frac{\partial f_i({\bf x})}{\partial x_N}\right]^T \end{displaymath}

Like in the 1-D case, if all equations are linear, the error terms $O_i(\delta{\bf x}^2)$ are zero and we have

\begin{displaymath}
{\bf f}({\bf x}+\delta{\bf x})={\bf f}({\bf x})+{\bf J}({\bf x})\delta{\bf x}
\end{displaymath}

By assuming ${\bf f}({\bf x}+\delta{\bf x})=0$, we can find the roots as ${\bf x}+\delta{\bf x}$, where $\delta{\bf x}$ can be obtained by solving the equation above:

\begin{displaymath}
\delta{\bf x}={\bf J}({\bf x})^{-1}[{\bf f}({\bf x}+\delta{\bf x})-{\bf f}({\bf x})]
=-{\bf J}({\bf x})^{-1}{\bf f}({\bf x})
\end{displaymath}

and the root can be found from any starting point ${\bf x}$ as

\begin{displaymath}
{\bf x}+\delta{\bf x}={\bf x}-{\bf J}({\bf x})^{-1}{\bf f}({\bf x})
\end{displaymath}

If the equations are nonlinear, this result is only an approximation of the real root, which can be improved iteratively:

\begin{displaymath}
{\bf x}_{n+1}={\bf x}_n+\delta{\bf x}_n
={\bf x}_n-{\bf J}({\bf x}_n)^{-1}\; {\bf f}({\bf x}_n)
\end{displaymath}

The iteration moves ${\bf x}_n$ in the N-D space from an initial position ${\bf x}_0=[x_1,\cdots, x_N]^T$ in such a direction that all functions $f_i({\bf x})$ ($i=1,\cdots,N$) are reduced.

Newton's method can be further generated to solve over-constrained non-linear equation systems with $N$ unknowns but $M>N$ equations. In the case the inverse matrix of the $M\times N$ Jacobian matrix ${\bf J}$ does not exist, but the $N\times M$ pseudo-inverse ${\bf J}^-=({\bf J}^T{\bf J})^{-1}{\bf J}^T$ can be used in the iteration:

\begin{displaymath}
{\bf x}_{n+1}={\bf x}_n-{\bf J}_n^-\; {\bf f}({\bf x}_n)
\end{displaymath}

or

\begin{displaymath}
\left[\begin{array}{c}x_1 \vdots x_N\end{array}\right]_{...
... \vdots \vdots \vdots f_M({\bf x}_n)
\end{array}\right]
\end{displaymath}

The iteration attempts to find a solution in the nonlinear least squares sense. This is essentially the Gauss-Newton algorithm to be considered later.

The Newton-Raphson method assumes the analytical expressions of all partial derivatives $\partial f_i({\bf x})/\partial x_j$ can be made available based on the functions ${\bf f}({\bf x})$, so that the Jacobian matrix ${\bf J}$ can be computed. However, when this is not the case, it is still possible to estimate the ijth component of $J_{ij}=\partial f_i/\partial x_j$ in ${\bf J}$ given ${\bf x}_n$ and ${\bf x}_{n+1}$ in two consecutive iterations:

\begin{displaymath}
J_{ij}=\frac{\partial f_i(x_1,\cdots,x_N)}{\partial x_j}
\ap...
...(n)})-f_i(x_1^{(n)},\cdots,x_N^{(n)})}{x_j^{(n+1)}-x_j^{(n)}}
\end{displaymath}

This can be considered as the secant method generalized from 1-D to N-D, based on two initial guesses.

Example 1


\begin{displaymath}
\left\{
\begin{array}{l}
3  x_1-\cos(x_2 x_3)-3/2=0\\
4x...
..._2^2+2 x_3-1=0\\
20  x_3+e^{-x_1x_2}+9=0
\end{array}\right.
\end{displaymath}


\begin{displaymath}
{\bf J}=\left[
\begin{array}{ccc}
3 & x_3\sin(x_2 x_3) & x_2...
..._2 e^{-x_1 x_2} & -x_1 e^{-x_1 x_2} & 20
\end{array}\right]
\end{displaymath}


\begin{displaymath}
\begin{array}{c\vert c\vert c}\hline
$n$ & ${\bf x}$ & err...
....833282, 0.035335, -0.498549) & 5.551e-16  \hline
\end{array}\end{displaymath}

Example 2

\begin{displaymath}
{\bf f}({\bf x})={\bf0},\;\;\;\;\;\;\;\;\;\;
\left\{ \begin{...
...({\bf x})=x_1x_3^2-3x_3+x_2x_3^2+x_1x_2=0
\end{array} \right.
\end{displaymath}


\begin{displaymath}
{\bf J}=\left[
\begin{array}{ccc}
2x_1-2 & 2x_2 & -1\\
x_2^...
...
x_3^2+x_2 & x_3^2+x_1 & 2x_1x_3-3+2x_2x_3
\end{array}\right]
\end{displaymath}

With ${\bf x}_0=[1,\;2,\;3]^T$, we get a root:

\begin{displaymath}
\begin{array}{c\vert c\vert c}\hline
$n$ & ${\bf x}$ & err...
...000000, 1.000000, 1.000000) & e=8.481e-12  \hline
\end{array}\end{displaymath}

With ${\bf x}_0=[0,\;0,\;0]^T$, we get another root:

\begin{displaymath}
\begin{array}{c\vert c\vert c}\hline
$n$ & ${\bf x}$ & err...
...098943, 0.367617, 0.144932) & e=4.837e-09  \hline
\end{array}\end{displaymath}

Broyden's method

Recall that the secant method approximate the derivative $f'(x_{n+1})$ based on $\delta x_n=x_{n+1}-x_n$ and $f(x_n)=f(x_{n+1})-f(x_n)$:

\begin{displaymath}
\hat{f}'(x_{n+1})=\frac{f(x_{n+1})-f(x_n)}{x_{n+1}-x_n}=\fra...
...x_n)+\frac{\delta f(x_n)-\hat{f}'(x_n)\delta x_n}{\delta x_n}
\end{displaymath}

Here $\hat{f}'(x_n)\delta x_n$ is the estimated value of $\delta f(x_n)$ based on the old derivate $\hat{f}'(x_n)$, which is different from the more accurate estimate $\hat{f}'(x_{n+1})\delta x_n$ based on $\hat{f}'(x_{n+1})$. The difference between these two estimates, when divided by $\delta x_n$, approximates the difference between $\hat{f}'(x_{n+1})$ and $\hat{f}'(x_n)$, i.e., the update formula of $\hat{f}'(x_{n+1})$ given above.

Broyden.png

The updated $\hat{f}'(x_{n+1})$ can then be used for updating $x_{n+1}$:

\begin{displaymath}x_{n+1}=x_n-\frac{f(x_n)}{\hat{f}'(x_n)} \end{displaymath}

Broyden's method, one of the quasi-Newton methods, can be considered as a generalization of this secant method for solving an N-D system ${\bf f}({\bf x})={\bf0}$. Instead of assuming the availability of the true Jacobian matrix $\hat{{\bf J}}$, here we estimate the next Jacobian $\hat{{\bf J}}_{n+1}$ by an iteration based on the current one. Let

\begin{displaymath}
\delta{\bf x}_n={\bf x}_{n+1}-{\bf x}_n,\;\;\;\;\;\;\;\delta{\bf f}_n={\bf f}_{n+1}-{\bf f}_n
\end{displaymath}

then

\begin{displaymath}\hat{{\bf J}}_{n+1}=\hat{{\bf J}}_n+\frac{(\delta{\bf f}_n-\h...
...}_n)\delta{\bf x}^T_n}{\vert\vert\delta {\bf x}_n\vert\vert^2} \end{displaymath}

With ${\bf J}_n$ available, we have the iteration for the solution update as before:

\begin{displaymath}{\bf x}_{n+1}={\bf x}_n-\hat{{\bf J}}_n^{-1} {\bf f}({\bf x}_n) \end{displaymath}

Here the inverse of ${\bf J}_n$ needs to be computed in each iteration. A better method is to estimate ${\bf J}_n^{-1}$ directly. Specifically we apply the Sherman-Morrison formula

\begin{displaymath}({\bf A}+{\bf u}{\bf v}^T)^{-1}={\bf A}^{-1}-\frac{{\bf A}^{-1}{\bf u}{\bf v}^T{\bf A}^{-1}}{1+{\bf v}^T{\bf A}^{-1}{\bf u}} \end{displaymath}

to the iteration above for ${\bf J}_n$, with

\begin{displaymath}{\bf u}=\frac{\delta{\bf f}_n-{\bf J}_n\delta{\bf x}_n}{\vert...
...\;\;\;{\bf v}=\delta{\bf x}_n,\;\;\;\;\;\;\;{\bf A}={\bf J}_n
\end{displaymath}

and get

\begin{displaymath}
\hat{{\bf J}}^{-1}_{n+1}=\hat{{\bf J}}^{-1}_n-\frac{\hat{{\b...
...^{-1}}
{\delta{\bf x}^T_n\hat{{\bf J}}_n^{-1}\delta{\bf f}_n}
\end{displaymath}


next up previous
Next: Optimization Up: Solving Non-Linear Equations Previous: Newton-Raphson method (univariate)
Ruye Wang 2015-02-12