Newton-Raphson Method (Multivariate)

The Newton-Raphson method discussed above for solving a single-variable equation $f(x)=0$ can be generalized to the case of multivariate equation systems containing $M$ equations of $N$ variables in ${\bf x}=[x_1,\cdots,x_N]^T$:

$\displaystyle {\bf f}({\bf x})=\left[\begin{array}{c}
f_1({\bf x}) \\ \vdots\\ ...
=\left[\begin{array}{c}0\\ \vdots\\ 0\end{array}\right]={\bf0}$ (95)

To solve the equation system, we first consider the Taylor series expansion of each of the $M$ functions in the neighborhood of the initial point ${\bf x}_0=[x_{01},\cdots,x_{0N}]^T$:

$\displaystyle f_m({\bf x})
=f_m({\bf x}_0)+\sum_{n=1}^N\frac{\partial f_m({\bf ...
...) +r_m(\vert\vert{\bf x}-{\bf x}_0\vert\vert^2),
\;\;\;\;\;\;\;\;(m=1,\cdots,M)$ (96)

where $r_m(\vert\vert{\bf x}-{\bf x}_0\vert\vert^2)$ represents the second and higher order terms in the series beyond the linear term, which can be neglected if $\vert\vert{\bf x}-{\bf x}_0\vert\vert$ is small. These $M$ equations can be expressed in matrix form
$\displaystyle {\bf f}({\bf x})$ $\displaystyle =$ $\displaystyle \left[\begin{array}{c}f_1({\bf x})\\ \vdots\\ f_M({\bf x})\end{ar...
+\left[\begin{array}{c}r_1\\ \vdots \\ r_M\end{array}\right]$  
  $\displaystyle =$ $\displaystyle {\bf f}({\bf x}_0)+{\bf J}({\bf x}_0)\;({\bf x}-{\bf x}_0)+{\bf r...
..._0)+{\bf J}({\bf x}_0)\;({\bf x}-{\bf x}_0)
={\bf f}_0+{\bf J}_0\;\Delta{\bf x}$ (97)

where $\Delta{\bf x}={\bf x}-{\bf x}_0$, while ${\bf f}_0={\bf f}({\bf x}_0)$ and ${\bf J}_0={\bf J}({\bf x}_0)$ are the function ${\bf f}({\bf x})$ and its Jacobian matrix ${\bf J}_f({\bf x})$ both evaluated at ${\bf x}_0$. We further consider solving the equation system ${\bf f}({\bf x})={\bf0}$ in the following two cases:

Comparing Eqs. (99) and (109), we see that the two algorithms are essentially the same, with the only difference that the regular inverse ${\bf J}^{-1}$ is used when $M=N$, but the pseudoinverse ${\bf J}^-$ is used when $M>N$ and ${\bf J}^{-1}$ does not exist.

The Newton-Raphson method assumes the availability of the analytical expressions of all partial derivatives $J_{mn}=\partial f_m({\bf x})/\partial x_n\;(m=1,\cdots,M,\;n=1,\cdots,N)$ in the Jacobian matrix ${\bf J}$. However, when this is not the case, $J_{mn}$ need to be approximated by forward or central difference (secant) method:

$\displaystyle J_{mn}=\frac{\partial f_m(x_1,\cdots,x_N)}{\partial x_n}$ $\displaystyle \approx$ $\displaystyle \frac{f_m(x_1,\cdots,x_n+h,\cdots,x_N)
  $\displaystyle \approx$ $\displaystyle \frac{f_m(x_1,\cdots,x_n+h,\cdots,x_N)
-f_m(x_1,\cdots,x_n-h,\cdots,x_N)}{2h}$ (111)

where $h$ is a small increment.

Example 1

3 \,x_1-\cos(x_2 x_3)-3/2=0\\
...,x_2^2+2 x_3-1=0\\
20 \,x_3+e^{-x_1x_2}+9=0

\begin{displaymath}{\bf J}=\left[
3 & x_3\sin(x_2 x_3) & x_2\...
...x_2\,e^{-x_1 x_2} & -x_1\,e^{-x_1 x_2} & 20
\end{array} \right]\end{displaymath}    

\begin{displaymath}\begin{array}{c\vert c\vert c}\hline
n & {\bf x} & error \\ \...
....833282, 0.035335, -0.498549) & 5.551e-16 \\ \hline

Example 2

$\displaystyle {\bf f}({\bf x})={\bf0},\;\;\;\;\;\;\;\;\;\;
\left\{ \begin{array...
f_3({\bf x})=x_1x_3^2-3x_3+x_2x_3^2+x_1x_2=0
\end{array} \right.$    

$\displaystyle {\bf J}=\left[\begin{array}{ccc}
2x_1-2 & 2x_2 & -1\\
x_2^2-1 & ...
..._2-3+x_3 & x_2 \\
x_3^2+x_2 & x_3^2+x_1 & 2x_1x_3-3+2x_2x_3

With ${\bf x}_0=[1,\;2,\;3]^T$, we get a root:

\begin{displaymath}\begin{array}{c\vert c\vert c}\hline
n & {\bf x} & error \\ \...
...& (1.00000, 1.00000, 1.00000) & 6.701e-06 \\ \hline

With ${\bf x}_0=[0,\;0,\;0]^T$, we get another root:

\begin{displaymath}\begin{array}{c\vert c\vert c}\hline
n & {\bf x} & error \\ \...
...& (1.09888, 0.36764, 0.14494) & 4.817e-06 \\ \hline

Broyden's method

In the Newton-Raphson method, two main operations are carried out in each iteration: (a) evaluate the Jacobian matrix ${\bf J}_{\bf f}({\bf x}_n)$ and (b) obtain its inverse ${\bf J}^{-1}_{\bf f}({\bf x}_n)$. To avoid these expensive computation for these operations, we can consider using Broyden's method, one of the quasi-Newton methods, which approximates the inverse of the Jacobian ${\bf J}^{-1}_{n+1}={\bf J}^{-1}({\bf x}_{n+1})$ from the ${\bf J}^{-1}_n={\bf J}^{-1}({\bf x}_n)$ in the previous iteration step, so that it can be updated iteratively from the initial ${\bf J}^{-1}_0={\bf J}^{-1}({\bf x}_0)$.

We first consider in the single-variable case how to estimate the next derivative $f'_{n+1}=f'(x_{n+1})$ from the current $f'_n=f'(x_n)$ by the secant method:

$\displaystyle f'_{n+1}\approx
\hat{f}'_{n+1}$ $\displaystyle =$ $\displaystyle \frac{f_{n+1}-f_n}{x_{n+1}-x_n}
=\frac{\delta f_n}{\delta x_n}
=\frac{\hat{f}'_n\delta x_n-\hat{f}'_n\delta x_n+\delta f_n}{\delta x_n}$  
  $\displaystyle =$ $\displaystyle \hat{f}'_n+\frac{\delta f_n-\hat{f}'_n\delta x_n}{\delta x_n}
=\hat{f}'_n+\hat{\delta} f'_n$ (112)

where The equation above indicates that the derivative $f'_{n+1}$ in the $(n+1)th$ step can be estimated by adding the estimated increment $\hat{\delta}f'_n$ to the derivative $\hat{f}'n$ in the current $nth$ step.


Having obtained $\hat{f}'_{n+1}$, we can use the same iteration in the Newton-Raphson method to find $x_{n+1}$:

$\displaystyle x_{n+1}=x_n-\frac{f(x_n)}{\hat{f}'_n}$ (114)

This method for single-variable case can be generalized to multiple variable case for solving ${\bf f}({\bf x})={\bf0}$. Following the way we estimate the increment of the derivative of a single-variable function in Eq. (113), here we can estimate the increment of the Jacobian of a multi-variable function:

$\displaystyle \delta\hat{\bf J}_n
=\frac{(\delta{\bf f}_n-\hat{{\bf J}}_n\delta...
...J}}_n\delta{\bf x}_n)\delta{\bf x}^T_n}{\vert\vert\delta {\bf x}_n\vert\vert^2}$ (115)

where $\delta{\bf x}_n={\bf x}_{n+1}-{\bf x}_n$ and $\delta{\bf f}_n={\bf f}_{n+1}-{\bf f}_n={\bf f}({\bf x}_{n+1})-{\bf f}({\bf x}_n)$. Now in each iteration, we can update the estimated Jacobian as well as the estimated root:

$\displaystyle \hat{\bf J}_{n+1}=\hat{\bf J}_n+\delta\hat{\bf J}_n,\;\;\;\;\;\;\...
...}={\bf x}_n+\delta {\bf x}_n
={\bf x}_n-\hat{{\bf J}}_n^{-1} {\bf f}({\bf x}_n)$ (116)

The algorithm can be further improved so that the inverse of the Jacobian ${\bf J}_n$ is avoided. Specifically, consider the inverse Jacobian:

$\displaystyle \hat{{\bf J}}_{n+1}^{-1}=\left( \hat{{\bf J}}_n+\delta\hat{\bf J}... x}_n)\delta{\bf x}^T_n}{\vert\vert\delta {\bf x}_n\vert\vert^2} \right]^{-1}$ (117)

We can apply the Sherman-Morrison formula:

$\displaystyle ({\bf A}+{\bf u}{\bf v}^T)^{-1}
={\bf A}^{-1}-\frac{{\bf A}^{-1}{\bf u}{\bf v}^T{\bf A}^{-1}}
{1+{\bf v}^T{\bf A}^{-1}{\bf u}}$ (118)

to the right-hand side of the equation above by defining

$\displaystyle {\bf A}=\hat{\bf J}_n,\;\;\;\;\;\;\;
{\bf u}=\frac{\delta{\bf f}_...
...}{\vert\vert\delta{\bf x}_n\vert\vert^2},
\;\;\;\;\;\;\;{\bf v}=\delta{\bf x}_n$ (119)

and rewrite it as:

$\displaystyle \hat{{\bf J}}^{-1}_{n+1}=\hat{{\bf J}}^{-1}_n\,-\,\frac{\hat{{\bf...
...n\,\hat{{\bf J}}_n^{-1}}
{\delta{\bf x}^T_n\hat{{\bf J}}_n^{-1}\delta{\bf f}_n}$ (120)

We see that the next $\hat{\bf J}_{n+1}^{-1}$ can be iteratively estimated directly from the previous $\hat{\bf J}_n^{-1}$, thereby avoiding computing the inverse of $\hat{\bf J}_n$ altogether. The algorithm is listed below: