The theoretical foundation of ICA is the
*Central Limit Theorem*,
which states that the distribution of the sum (average or linear
combination) of independent random variables approaches Gaussian
as
. For example, the face value of a dice has a
uniform distribution from 1 to 6. But the distribution of the sum of a
pair of dice is no longer uniform. It has a maximum probability at the
mean of 7. The distribution of the sum of the face values will be better
approximated by a Gaussian as the number of dice increases.

Specifically, if are random variables independently drawn from an arbitrary distribution with mean and variance . Then the distribution of the mean approaches Gaussian with mean and variance .

To solve the BSS problem, we want to find a matrix so that is as close to the independent sources as possible. This can be seen as the reverse process of the central limit theorem above.

Consider one component
of , where
is the ith row of . As a linear combination
of all components of , is necessarily more Gaussian
than any of the components unless it is equal to one of them (i.e.,
has only one non-zero component.
In other words, the goal
can be achieved by finding
that maximizes the **non-Gaussianity** of
(so that is least Gaussian). This is
the essence of all ICA methods. Obviously if all source variables are
Gaussian, the ICA method will not work.

Based on the above discussion, we get requirements and constrains for the ICA methods:

- The number of observed variables must be no fewer than the number of independent sources (assume in the following).
- The source components are stochastically independent, and have to be non-Gaussian (with possible exception of at most one Gaussian).
- The estimation of and is up to a scaling factor .
Let
and
,
and
and
, we have

Also the scaling factor could be either positive or negative. For this reason, we will always assume the independent components have unit variance . As they are also uncorrelated (all independent variables are uncorreclated), we have , i.e.,

- The estimated independent components are not in any particular order. When the order of the corresponding elements in both and is rearranged, still holds.

All ICA methods are based on the same fundamental approach of finding a
matrix that maximizes the non-Gaussianity of
thereby minimizing the independence of , and they can be formulated as:

All ICA methods are an optimization process (always iterative) to find a matrix to maximize some objective function that measures the degree of non-Gaussianity or independence of the estimated components . In the following, we will discuss some common objective functions.