The mutual information of two random variables and is
defined as

Obviously when and are independnent, i.e., and , their mutual information is zero.

Similarly the mutual information
of a set of
variables () is defined as

If random vector is a linear transform of another random vector :

then the entropy of is related to that of by:

where is the Jacobian of the above transformation:

The mutual information above can be written as

We further assume to be uncorrelated and of unit variance, i.e., the covariance matrix of is

and its determinant is

This means is a constant (same for any ). Also, as the second term in the mutual information expression is also a constant (invariant with respect to ), we have

i.e., minimization of mutual information is achieved by minimizing the entropies

As Gaussian density has maximal entropy, minimizing entropy is equivalent to minimizing Gaussianity.

Moreover, since all have the same unit variance, their negentropy becomes

where is the entropy of a Gaussian with unit variance, same for all . Substituting into the expression of mutual information, and realizing the other two terms and are both constant (same for any ), we get

where is a constant (including all terms , and ) which is the same for any linear transform matrix . This is the fundamental relation between mutual information and negentropy of the variables . If the mutual information of a set of variables is decreased (indicating the variables are less dependent) then the negentropy will be increased, and are less Gaussian. We want to find a linear transform matrix to minimize mutual information , or, equivalently, to maximize negentropy (under the assumption that are uncorrelated).