First consider nonlinearly mapping all data points to
in a higher dimensional feature space , where
the covariance matrix can be estimated as
Any new data point can now be mapped to in the
high-dimensional space , where its i-th PCA component can be
obtained as its projection on the i-th eigenvector
These are the steps for the implementation of the kernel PCA algorithm:
Some commonly used kernels are:
The above discussion is based on the assumption that the data points have
zero mean, and therefore the covariance of two data points is the same as
their correlation. While the data points can be easily de-meaned in the
original space by subtracting the mean vector from all points, it is done
differently in the feature space. The de-meaned data points in the feature
In summary, the mapping in the kernel PCA algorithm is never explicitly specified, neither is the dimensionality of the feature space . Similarly, the covariance matrix and its eigenvectors are only mentioned in the above derivation, but they do not need to be computed in implementation. The potential high dimensionality of does not cost extra computation, as only the kernel is needed in the implementation.
Kernel PCA will have all the advantages of the regular PCA, as well as the implicit nonlinear mapping to a feature space where the features representing the structure in the data may be better extracted. For example, data classification may be much more easily carried out in due to linear saparability.