Cubic Spline Interpolation

All previously discussed methods of polynomial interpolation fit a set of $n+1$ given points $y_i=f(x_i),\;(i=0,\cdots,n)$ by an nth degree polynomial, and a higher degree polynomial is needed to fit a larger set of data points. A major drawback of such methods is overfitting, as domonstrated by the following example.


Vased on $n+1=11$ equally spaced points from $x_0=-5$ to $x_{10}=5$ with increment of 1, a function $y=f(x)=1/(1+x^2)$ can be approximated by any of the interpolation methods discussed above by polynomial of degree $n=10$, as shown in the figure below. We note that the approximation is very poor towards to the two ends where the error $R_n(x)=f(x)-N_n(x)$ is disappointingly high. This is known as Runge's phenomenon, indicating the fact that higher degree polynomial interpolation does not necessarily always produce more accurate result, as the degree of the interpolating polynomial may become unnecessarily high and the polynomial may become oscillatory.

This Runge's phenominon is a typical example of overfitting, due to an excessively complex model with too many parameters relative to the observed data, here specifically a polynomial of a degree too high (requiring too many coefficients) to model the given data points.


Now we consider a different method of spline interpolation, which fits the given points by a piecewise polynomial function $S(x)$, known as the spline, a composite function formed by $n$ low-degree polynomials $P_i(x)$ each fitting $f(x)$ in the interval between $x_{i-1}$ and $x_i,\;(i=1,\cdots,n)$:

$\displaystyle S(x)=\left\{\begin{array}{cc}P_1(x) & x_0\le x\le x_1\\
\vdots &...
...e x\le x_i\\
\vdots & \vdots\\
P_n(x) & x_{n-1}\le x\le x_n\end{array}\right.$ (61)

As this method does not use a single polynomial of degree $n$ to fit all $n+1$ points at once, it avoids high degree polynomials and thereby the potential problem of overfitting. These low-degree polynomials need to be such that the spline $S(x)$ they form is not only continuous but also smooth.

In the following we consider approximating $f(x)$ between any two consecutive points $x_i$ and $x_{i+1}$ by a linear, quadratic, and cubic polynomial $P_i(x)$ (of first, second, and third degree).


A function $y=f(x)=x\,\sin(2x+\pi/4)+1$ is sampled at the following $n+1=4$ points:

\begin{displaymath}\begin{array}{c\vert\vert c\vert c\vert c\vert c}\hline
i & 0...
...(x_i) & 1.937 & 1.000 & 1.349 & -0.995 \\ \hline
\end{array}\\ \end{displaymath}    

The interpolation results based on linear, quadratic and cubic splines are shown in the figure below, together with the original function $f(x)$, and the $n=3$ interpolating polynomials $P_i(x),
\;Q_i(x),\;C_i(x),\;(i=1,\cdots,3)$, used as the ith segment of $S(x)$ between $x_{i-1}$ and $x_i$.

For the quadratic interpolation, based on $D_0=f'(x_0)=-1.635$ we get $D_1=-0.240,\;D_2=0.937,\;D_3=-5.624$. For the cubic interpolation, we solve the following equation

$\displaystyle \left[\begin{array}{llll}2 & 1 & 0 & 0\\ 0.5 & 2 & 0.5 & 0\\
0 &...]
=\left[\begin{array}{r}4.185\\ 3.858\\ -8.076\\ 9.827\end{array}\right]
\\ $    

and get $M_0=0.281,\;M_1=3.622,\;M_2=-7.054,\;M_3=8.440$.

The errors of these three methods are $\epsilon=0.2103$, $\epsilon=0.4371$, and $\epsilon=0.0615$, respectively. Obviously the higher the degree of the interpolating polynomial, the higher the accuracy. The error of the cubic spline method is significantly smaller than $\epsilon=0.3063$ of the polynomial interpolation.


The Matlab code that implements the cubic spline method is listed below.

function [S C]=Spline3(u,x,y,dya,dyb)
    % vectors x and y contain n+1 points and the corresponding function values
    % vector u contains all discrete samples of the continuous argument of f(x)
    % dya and dyb are the derivatives f'(x_0) and f'(x_n), respectively 
    n=length(x);       % number of interpolating points
    k=length(u);       % number of discrete sample points
    C=zeros(n,k);      % the n-1 cubic interpolating polynomials
    A=2*eye(n);        % coefficient matrix on left-hand side
    d=zeros(n,1);      % vector on right-hand side
    d(1)=((y(2)-y(1))/(x(2)-x(1))-dya)/h0;  % first element of d
    for i=2:n-1
        d(i)=((y(i+1)-y(i))/h1-(y(i)-y(i-1))/h0)/h2; % 2nd divided difference
    d(n)=(dyb-(y(n)-y(n-1))/h1)/h1;   % last element of d
    M=6*inv(A)*d;                     % solving linear equation system for M's
    for i=2:n
        C(i-1,:)=(x1.^3*M(i-1)+x0.^3*M(i))/6/h... % the ith cubic polynomial
        idx=find(u>x(i-1) & u<=x(i));  % indices between x(i-1) and x(i)
        S(idx)=C(i-1,idx);             % constructing spline by cubic polynomials


The function $y=f(x)=1/(1+x^2)$ used before is now approximated by both the Newton's method and the cubic spline method, with very different results as shown below. The Runge's phenomenon suffered by Newton's method is certainly avoided by the cubic spline method.