Many orthogonal transforms, such as discrete cosine transform (DCT), Walsh-Hadamard transform (WHT), etc., can be used as well as Fourier transform. In any case, the N sample values of a given signal (N components of the signal vector) are transformed to get another set of N values (N components of the rotated vector), and the two sets of N signal components before and after the transform carry the same amount energy and information. Why do we take the trouble to carry out such orthogonal transform (or rotation in the N-dimensional space)? What do we gain by doing so?

In general, orthogonal transforms have the following two main properties:

**Orthogonal transforms tend to decorrelate the components of a given signal**Given a sample of a

*natural*signal (measurements of some physical variable such as voltage, pressure, temperature, etc. over time), the value of can be estimated to be in the neighborhood of value 5 (extrapolation). Moreover, if also given , the value of can estimated be the average of its two neighbors (interpolation). We can do so with certain confidence because we expect the signal to be highly correlated (e.g., between and ). In general a natural signal is expected to be smooth without major discontinuities which correspond to large amount of energy. However, the same is not necessarily true in frequency domain. Knowing a Fourier coefficient does not helps estimate the next coefficient , as the Fourier coefficients are usually not as correlated as the signal samples in time domain.**Orthogonal transforms tend to redistribute the energy contained in the signal so that most of energy is contained in a small number of components.**The amplitude of a natural signal tends to stay in the same range over time, therefore its power (proportional to its magnitude squared) stays at about the same level. In other words, the energy in the signal tends to be distributed relatively evenly over time. However, in the frequency domain, the energy (of the same amount as in time domain by Parseval's theorem) is mostly concentrated in a small number of frequency components, most obviously the DC component and the fundamental frequencies. The amplitudes (thereby the power) of most high frequency harmonics are small and therefore they carry little energy.

Due to these two properties of the orthogonal transform, it is more convenient and efficient to carry out various signal processing operations, such as information extraction (e.g., filtering) and signal compression, in transform domain. (For more detailed discussion, refer to this page.) This is the main reason why orthogonal transforms (DFT, DCT, PCA, etc.) are widely used for signal processing in many science and engineering fields.

**Principal Component Analysis**

A particular transform called *Karhunen-Loeve (KL) Transform*, also
called *principal component analysis (PCA) transform*, is optimal
among all possible orthogonal transforms in terms of the two properties
discussed above. After PCA transform, the signal components are completely
decorrelated, and the energy contained in the signal is maximally
concentrated in a small number of components.

The figure above illustrates an orthogonal transform in a dimensional space. The signal components and are highly correlated, and they carry about the same amount of information (dynamic energy). But after the rotation corresponding to an orthogonal transform , the components and are decorrelated, and most of the information is concentrated in component . If we only keep but ignore , a 50% compression rate can be achieved without losing much information in the signal.