next up previous
Next: Order and rate of Up: Preliminaries Previous: Preliminaries

Sensitivity and Conditioning

In numerical methods, it is important to know how reliable the numerical results produced by some algorithm are, and how sensitive the algorithm is with respect to the inevitable noise in the input variables and system parameters, due possibly to some observational error and truncation error. We need to know how sensitive the output of a given method is with respect to the various perturbations that may occur in computational process. In general, if a large difference in the output may be caused by a small change in the input, then the result is not reliable. How sensitive a problem is and whether the numerical solution is reliable or not depends on whether the problem is well behaved or ill behaved:

How well or ill conditioned a system is can be measured quantitatively by the absolute or relative condition numbers:

Here $\vert\vert x \vert\vert$ is the norm (representing ``size'') of any variable $x$ (scalar, vector, matrix, function, etc.). In the following, we will use mostly the $p=2$ norm for vectors and the corresponding spectral norm for matrices.

As the relative condition number compares the normalized changes in both the input and output, its value is invariant to the specific units used to measure them. Also the change in input can also be equivalently normalized by the input plus change $\vert\vert\mbox{input + change in input}\vert\vert$ as well as $\vert\vert\mbox{input}\vert\vert$ in input. The same is true for the change in the output.

Obviously, the smaller the condition number, the better conditioned the problem is, the less sensitive the solution is with respect to the perturbation of the data (error, noise, approximation, etc.); and the larger the condition number, the more ill conditioned the problem is. In the following, we specifically consider the condition numbers defined for some different systems.

System with single input and output variables

Consider a system of a single input $x$ and single output $y$, which can be described by a function $y=f(x)$ to be evaluated at a given input $x$. The condition number of the function at point $x$ is defined as the greatest (worst case) ratio between the changes of the output and the input:

\begin{displaymath}
\hat\kappa=\limit\lim_{\epsilon\rightarrow 0}\sup\limits_{\v...
...a f(x)\vert/\vert f(x)\vert}{\vert\delta x\vert/\vert x\vert},
\end{displaymath}

Here $\vert x\vert$ is the norm of the scaler variable which is either the absolute value of $x\in\mathbb{R}$ or modulus of $x\in\mathbb{C}$. Note that in general the condition number is a function of $x$.

When the perturbation $\delta x$ is small, the Taylor expansion of the function can be approximated as

\begin{displaymath}
f(x+\delta x)=f(x)+f'(x)\delta x+f''(x)\frac{\delta x^2}{2}+\cdots
\approx f(x)+f'(x)\delta x
\end{displaymath}

i.e.,

\begin{displaymath}
\delta y=f(x+\delta x)-f(x)\approx f'(x)\delta x
\end{displaymath}

Taking the absolute value or modulus on both sides, we get

\begin{displaymath}
\big\vert\delta y\big\vert=\big\vert f(x+\delta x)-f(x)\big\...
...g\vert\delta x\big\vert
=\hat\kappa \big\vert\delta x\big\vert
\end{displaymath}

where $\hat\kappa=\vert\delta y\vert/\vert\delta x\vert=\vert f'(x)\vert$ is the absolute condition number. Dividing both sides by $\vert y\vert=\vert f(x)\vert$, we get

\begin{displaymath}
\frac{\big\vert\delta y\big\vert}{\big\vert y\big\vert}
=\fr...
...kappa \frac{\big\vert \delta x\big\vert}{\big\vert x\big\vert}
\end{displaymath}

where $\kappa$ is the relative condition number:

\begin{displaymath}
\kappa=\frac{\vert\delta y\vert/\vert y\vert}{\vert\delta x\...
...vert}
=\frac{\vert f'(x)\vert}{\vert f(x)\vert/\vert x\vert}
\end{displaymath}

The discussion above is for the task of evaluating the output $y=f(x)$ of a system as a function of the input $x$. However, if the task is to solve a given equation $f(x)=0$, for the purpose of finding $x$ treated as the output that satisfies the equation with $y=f(x)=0$ treated as the input, the problem needs to be converted into the form of evaluating the function $x=g(y)=f^{-1}(y)$ at $y=0$. The absolute condition number of this function $g(y)=f^{-1}(y)$ is

\begin{displaymath}
\hat\kappa=\vert g'(y)\vert=\bigg\vert\frac{d}{dy}[ f^{-1}(y)]\bigg\vert
=\frac{1}{\big\vert f'(x)\big\vert}
\end{displaymath}

which is the reciprocal of the absolute condition number $\vert f'(x)\vert$ of the corresponding evaluating problem. We therefore see that the larger $\vert f'(x)\vert$, the smaller $g'(y)$, the better conditioned the problem is.

In the figure below, $\vert f'_1(x)\vert>\vert f'_2(x)\vert$, therefore for the problem of solving $f(x)=0$, $f_1(x)$ is better conditioned than $f_2(x)$, but for the problem of evaluating $y=f(x)$, $f_1(x)$ is more ill-conditioned than $f_2(x)$.

Conditioned.png

Example: Consider the function

\begin{displaymath}
y=f(x)=\frac{x}{1-x},\;\;\;\;\;\;f'(x)=\frac{1}{(1-x)^2}
\end{displaymath}


\begin{displaymath}
y=x^5
\end{displaymath}

NMexample1.png


\begin{displaymath}
\begin{tabular}{c\vert\vert c\vert c\vert c\vert c} \hline
...
...=f(x) & -0.4975 & -0.4949 & 49.0 & 99.0  \hline
\end{tabular}\end{displaymath}

We see the function is well-conditioned at $x=-0.99$ but ill-conditioned at $x=0.99$.

Systems with Multiple input and output variables

The results above can be generalized to a system of multiple inputs ${\bf x}=[x_1,\cdots,x_N]^T$ and multiple outputs ${\bf y}=[y_1,\cdots,y_M]^T$ represented by a function ${\bf y}={\bf f}({\bf x})$. A change $\delta{\bf x}$ in the input will cause certain change in the output:

\begin{displaymath}
\delta{\bf y}={\bf f}({\bf x}+\delta{\bf x})-{\bf f}({\bf x})
\end{displaymath}

When $\delta{\bf x}$ is small, the output can be approximated by the first two terms of its Taylor series expansion:

\begin{displaymath}
{\bf f}({\bf x}+\delta{\bf x})\approx{\bf f}({\bf x})+{\bf J}_f({\bf x})
\delta{\bf x}
\end{displaymath}

and we get

\begin{displaymath}
\vert\vert\delta{\bf y}\vert\vert=\vert\vert{\bf f}({\bf x}+...
...a{\bf x}\vert\vert=\hat\kappa\vert\vert\delta{\bf x}\vert\vert
\end{displaymath}

where $\hat\kappa$ is the absolute condition number defined as:

\begin{displaymath}
\hat\kappa=\frac{\vert\vert\delta{\bf y}\vert\vert}{\vert\ve...
...elta{\bf x}\vert\vert}
=\vert\vert{\bf J}_f({\bf x})\vert\vert
\end{displaymath}

Dividing both sides of the equation above by $\vert\vert{\bf y}\vert\vert=\vert\vert{\bf f}({\bf x})\vert\vert$ we get

\begin{displaymath}
\frac{\vert\vert\delta{\bf y}\vert\vert}{\vert\vert{\bf y}\v...
...vert\vert\delta{\bf x}\vert\vert}{\vert\vert{\bf x}\vert\vert}
\end{displaymath}

where $\kappa$ is the relative condition number defined as:

\begin{displaymath}
\kappa=
\frac{\vert\vert\delta{\bf y}\vert\vert/\vert\vert{\...
...rt\vert{\bf f}({\bf x})\vert\vert/\vert\vert{\bf x}\vert\vert}
\end{displaymath}

In particular, consider a linear system of $N$ inputs ${\bf x}=[x_1,\cdots,x_N]^T$ and $M$ outputs ${\bf y}=[y_1,\cdots,y_M]^T$ described by

\begin{displaymath}
{\bf y} = {\bf f}({\bf x})={\bf A}_{M\times N} {\bf x}
\end{displaymath}

We see that for both evaluation and equation solving problems, we have $\kappa=\vert\vert{\bf A}\vert\vert\;\vert\vert{\bf A}^{-1}\vert\vert$. If the singular values of ${\bf A}$ are $\{\sigma_1\ge\cdots\ge\sigma_n\}$, the singular values of ${\bf A}^{-1}$ are $\{1/\sigma_n\ge\cdots\ge 1/\sigma_1\}$, and their norms can be written in terms of their greatest and smallest eigenvalues, respectively: $\vert\vert{\bf A}\vert\vert=\sigma_{max}=\sigma_1$ and $\vert\vert{\bf A}^{-1}\vert\vert=1/\sigma_{min}=1/\sigma_n$. Now we have

\begin{displaymath}
\kappa({\bf A})=\vert\vert{\bf A}\vert\vert\;\vert\vert{\bf A}^{-1}\vert\vert=\frac{\sigma_{max}}{\sigma_{min}}
\end{displaymath}

We see that the condition number of ${\bf A}$ is large if its $\sigma_{max}$ and $\sigma_{min}$ are far apart, but it is small if otherwise.

In fact, $\kappa({\bf A})$ is a measurement of how close ${\bf A}$ is to singularity. When ${\bf A}$ is singular, one or more of its singular values are zero, i.e., $\sigma_{min}=0$, then $\kappa({\bf A})=\infty$.

In Matlab, function cond(A,p) generates the condition number of matrix A based on its p-norm. When $p=2$, this is the ratio between the greatest and smallest singular values. The following will give the same result:

where n is the dimension of A.

Let us now consider the worst case scenarios for both evaluating the matrix-vector product ${\bf y}={\bf A}{\bf x}$ and solving the linear equation system ${\bf x}={\bf A}{\bf y}$. (In both cases, given ${\bf x}$ as the input, we find ${\bf y}$ as the output.) To do so, we first carry out the SVD decomposition of ${\bf A}$ to get ${\bf A}={\bf U}{\bf\Sigma}{\bf V}^*$ with the singular values in ${\bf\Sigma}$ sorted in descending order: $\sigma_1\ge\cdots\ge\sigma_n$. The inverse of ${\bf A}$ can also be SVD decomposed: ${\bf A}^{-1}={\bf V}{\bf\Sigma}^{-1}{\bf U}^*$.

Example 1: Consider solving the linear system ${\bf A}{\bf x}={\bf b}$ with

\begin{displaymath}
{\bf A}=\frac{1}{2}\left[\begin{array}{rr}3.0 & 2.0 1.0 & ...
...eft[\begin{array}{rr}0.8 & -0.4 -0.2 & 0.6\end{array}\right]
\end{displaymath}

The singular values of ${\bf A}$ are $\sigma_1=2.5583$ and $\sigma_2=0.9772$, the condition number is

\begin{displaymath}
\kappa=\vert\vert{\bf A}\vert\vert\;\vert\vert{\bf A}^{-1}\vert\vert=\frac{\sigma_1}{\sigma_2}=2.618
\end{displaymath}

which is small, indicating this is a well-behaved system. Given two similar inputs ${\bf b}_1=[1,\;1]^T$ and ${\bf b}_2=[0.99,\;1.01]^T$ with $\delta{\bf b}=[0.01,\;-0.01]^T$, we find the corresponding solutions:

\begin{displaymath}
{\bf x}_1={\bf A}^{-1}{\bf b}_1
=\left[\begin{array}{r}0.4&0...
...ta{\bf x}=\left[\begin{array}{r}0.012&-0.008\end{array}\right]
\end{displaymath}

We have

\begin{displaymath}
\vert\vert\delta{\bf b}\vert\vert=0.0141,\;\;\;\;\vert\vert{...
...\vert\vert=0.0144,\;\;\;\;\vert\vert{\bf x}_1\vert\vert=0.5657
\end{displaymath}

and

\begin{displaymath}
\frac{\vert\vert\delta{\bf x}\vert\vert/\vert\vert{\bf x}_1\...
...ert/\vert\vert{\bf b}_1\vert\vert}
=\frac{0.0255}{0.01}=2.5495
\end{displaymath}

Now consider a different matrix

\begin{displaymath}
{\bf A}=\frac{1}{2}\left[\begin{array}{rr}1.000 & 1.000 1....
...[\begin{array}{rr}-999 & 1000 1001 & -1000\end{array}\right]
\end{displaymath}

The singular values of ${\bf A}$ are $\sigma_1=1.0$ and $\sigma_2=0.0005$, the condition number is

\begin{displaymath}
\kappa=\vert\vert{\bf A}\vert\vert\;\vert\vert{\bf A}^{-1}\vert\vert=\frac{\sigma_1}{\sigma_2}=2000
\end{displaymath}

indicating matrix ${\bf A}$ is close to singularity, and the system is ill-conditioned. The solutions corresponding to the same two inputs ${\bf b}_1=[1,\;1]^T$ and ${\bf b}_2=[0.99,\;1.01]^T$ are

\begin{displaymath}
{\bf x}_1={\bf A}^{-1}{\bf b}_1
=\left[\begin{array}{r}1&1\e...
...ta{\bf x}=\left[\begin{array}{r}-19.99&20.01\end{array}\right]
\end{displaymath}

We can further find

\begin{displaymath}
\vert\vert\delta{\bf b}\vert\vert=0.0141,\;\;\;\;\vert\vert{...
...vert\vert=28.2843,\;\;\;\;\vert\vert{\bf x}_1\vert\vert=1.4142
\end{displaymath}

and the condition number is:

\begin{displaymath}
\frac{\vert\vert\delta{\bf x}\vert\vert/\vert\vert{\bf x}_1\...
...ert\vert/\vert\vert{\bf b}_1\vert\vert}
=\frac{200}{0.01}=2000
\end{displaymath}

We see that a small relative change $\vert\vert\delta{\bf b}\vert\vert/\vert\vert{\bf b}_1\vert\vert=0.01$ in the input caused a huge change $\vert\vert\delta{\bf x}\vert\vert/\vert\vert{\bf x}_1\vert\vert=20$ in the output (2000 times greater).

Example 2: Consider the worst case of the evaluation of ${\bf y}={\bf A}{\bf x}$ with

\begin{displaymath}
{\bf A}=\left[ \begin{array}{ll} 3.1 & 2.0 6.0 & 3.9\end{...
...ay}{rr}-0.8388 & -0.5444 -0.5444 & 0.8388\end{array}\right]
\end{displaymath}

As ${\bf A}$ is almost singular, its condition number is large:

\begin{displaymath}
\kappa=\vert\vert{\bf A}\vert\vert\;\vert\vert{\bf A}^{-1}\...
...{\sigma_{max}}{\sigma_{min}}=\frac{8.0511}{0.0112}\approx 720
\end{displaymath}

indicating problems associated with this matrix are ill-conditioned. For example, if ${\bf x}={\bf v}_{min}$, we get

\begin{displaymath}
{\bf x}={\bf v}_2=\left[\begin{array}{rr}-0.5444 0.8388\e...
...}
=\left[\begin{array}{rr}-0.0099 0.0051\end{array}\right]
\end{displaymath}

Even for a small change $\delta{\bf x}$ in the input, causing a small change $\delta{\bf y}$ in the output:

\begin{displaymath}
\delta{\bf x}=\left[\begin{array}{rr}0.01 -0.01\end{array...
...f x}
=\left[\begin{array}{rr}0.011 0.021\end{array}\right]
\end{displaymath}

the relative error is still large:

\begin{displaymath}
\frac{\vert\vert\delta{\bf y}\vert\vert/\vert\vert{\bf y}\ve...
...vert\vert{\bf x}\vert\vert}
=\frac{2.1207}{0.0141} \approx 150
\end{displaymath}

Example 3: Consider the worst case while solving the equation ${\bf x}={\bf A}{\bf y}$, where ${\bf A}$ is the same matrix used in the previous example. If ${\bf x}={\bf u}_{max}$, we get

\begin{displaymath}
{\bf x}={\bf u}_1=\left[\begin{array}{rr}-0.4582 -0.8888\...
...
=\left[\begin{array}{rr}-0.1042 -0.0676\end{array}\right]
\end{displaymath}

Even for a small change $\delta{\bf x}$ in the input, causing a small change $\delta{\bf y}$ in the output:

\begin{displaymath}
\delta{\bf x}=\left[\begin{array}{rr}0.01 -0.01\end{array...
...}
=\left[\begin{array}{rr}0.6556 -1.0111\end{array}\right]
\end{displaymath}

the relative error is still large:

\begin{displaymath}
\frac{\vert\vert\delta{\bf y}\vert\vert/\vert\vert{\bf y}\v...
...ert\vert{\bf x}\vert\vert}
=\frac{9.7019}{0.0141}\approx 686
\end{displaymath}

Approximate solution

The two different criteria are consistent only if the problem is well-conditioned. If the problem is ill-conditioned, they can be very different.


next up previous
Next: Order and rate of Up: Preliminaries Previous: Preliminaries
Ruye Wang 2015-02-12