# FastICA

FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology.[1][2] Like most ICA algorithms, FastICA seeks an orthogonal rotation of prewhitened data, through a fixed-point iteration scheme, that maximizes a measure of non-Gaussianity of the rotated components. Non-gaussianity serves as a proxy for statistical independence, which is a very strong condition and requires infinite data to verify. FastICA can also be alternatively derived as an approximative Newton iteration.

## Algorithm

### Prewhitening the data

Let the ${\displaystyle \mathbf {X} :=(x_{ij})\in \mathbb {R} ^{N\times M}}$ denote the input data matrix, ${\displaystyle M}$ the number of columns corresponding with the number of samples of mixed signals and ${\displaystyle N}$ the number of rows corresponding with the number of independent source signals. The input data matrix ${\displaystyle \mathbf {X} }$ must be prewhitened, or centered and whitened, before applying the FastICA algorithm to it.

• Centering the data entails demeaning each component of the input data ${\displaystyle \mathbf {X} }$, that is,
${\displaystyle x_{ij}\leftarrow x_{ij}-{\frac {1}{M}}\sum _{j^{\prime }}x_{ij^{\prime }}}$
for each ${\displaystyle i=1,\ldots ,N}$ and ${\displaystyle j=1,\ldots ,M}$. After centering, each row of ${\displaystyle \mathbf {X} }$ has an expected value of ${\displaystyle 0}$.
• Whitening the data requires a linear transformation ${\displaystyle \mathbf {L} :\mathbb {R} ^{N\times M}\to \mathbb {R} ^{N\times M}}$ of the centered data so that the components of ${\displaystyle \mathbf {L} (\mathbf {X} )}$ are uncorrelated and have variance one. More precisely, if ${\displaystyle \mathbf {X} }$ is a centered data matrix, the covariance of ${\displaystyle \mathbf {L} _{\mathbf {x} }:=\mathbf {L} (\mathbf {X} )}$ is the ${\displaystyle (N\times N)}$-dimensional identity matrix, that is,
${\displaystyle \mathrm {E} \left\{\mathbf {L} _{\mathbf {x} }\mathbf {L} _{\mathbf {x} }^{T}\right\}=\mathbf {I} _{N}}$
A common method for whitening is by performing an eigenvalue decomposition on the covariance matrix of the centered data ${\displaystyle \mathbf {X} }$, ${\displaystyle E\left\{\mathbf {X} \mathbf {X} ^{T}\right\}=\mathbf {E} \mathbf {D} \mathbf {E} ^{T}}$, where ${\displaystyle \mathbf {E} }$ is the matrix of eigenvectors and ${\displaystyle \mathbf {D} }$ is the diagonal matrix of eigenvalues. The whitened data matrix is defined thus by
${\displaystyle \mathbf {X} \leftarrow \mathbf {E} \mathbf {D} ^{-1/2}\mathbf {E} ^{T}\mathbf {X} .}$

### Single component extraction

The iterative algorithm finds the direction for the weight vector ${\displaystyle \mathbf {w} \in \mathbb {R} ^{N}}$ that maximizes a measure of non-Gaussianity of the projection ${\displaystyle \mathbf {w} ^{T}\mathbf {X} }$, with ${\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}}$ denoting a prewhitened data matrix as described above. Note that ${\displaystyle \mathbf {w} }$ is a column vector. To measure non-Gaussianity, FastICA relies on a nonquadratic nonlinear function ${\displaystyle f(u)}$, its first derivative ${\displaystyle g(u)}$, and its second derivative ${\displaystyle g^{\prime }(u)}$. Hyvärinen states that the functions

${\displaystyle f(u)=\log \cosh(u),\quad g(u)=\tanh(u),\quad {\text{and}}\quad {g}'(u)=1-\tanh ^{2}(u),}$

are useful for general purposes, while

${\displaystyle f(u)=-e^{-u^{2}/2},\quad g(u)=ue^{-u^{2}/2},\quad {\text{and}}\quad {g}'(u)=(1-u^{2})e^{-u^{2}/2}}$

may be highly robust.[1] The steps for extracting the weight vector ${\displaystyle \mathbf {w} }$ for single component in FastICA are the following:

1. Randomize the initial weight vector ${\displaystyle \mathbf {w} }$
2. Let ${\displaystyle \mathbf {w} ^{+}\leftarrow E\left\{\mathbf {X} g(\mathbf {w} ^{T}\mathbf {X} )^{T}\right\}-E\left\{g'(\mathbf {w} ^{T}\mathbf {X} )\right\}\mathbf {w} }$, where ${\displaystyle E\left\{...\right\}}$ means averaging over all column-vectors of matrix ${\displaystyle \mathbf {X} }$
3. Let ${\displaystyle \mathbf {w} \leftarrow \mathbf {w} ^{+}/\|\mathbf {w} ^{+}\|}$
4. If not converged, go back to 2

### Multiple component extraction

The single unit iterative algorithm estimates only one weight vector which extracts a single component. Estimating additional components that are mutually "independent" requires repeating the algorithm to obtain linearly independent projection vectors - note that the notion of independence here refers to maximizing non-Gaussianity in the estimated components. Hyvärinen provides several ways of extracting multiple components with the simplest being the following. Here, ${\displaystyle \mathbf {1} }$ is a column vector of 1's of dimension ${\displaystyle M}$.

Algorithm FastICA

Input: ${\displaystyle C}$ Number of desired components
Input: ${\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}}$ Prewhitened matrix, where each column represents an ${\displaystyle N}$-dimensional sample, where ${\displaystyle C<=N}$
Output: ${\displaystyle \mathbf {W} \in \mathbb {R} ^{N\times C}}$ Un-mixing matrix where each column projects ${\displaystyle \mathbf {X} }$ onto independent component.
Output: ${\displaystyle \mathbf {S} \in \mathbb {R} ^{C\times M}}$ Independent components matrix, with ${\displaystyle M}$ columns representing a sample with ${\displaystyle C}$ dimensions.
 for p in 1 to C:
${\displaystyle \mathbf {w_{p}} \leftarrow }$ Random vector of length N
while ${\displaystyle \mathbf {w_{p}} }$ changes
${\displaystyle \mathbf {w_{p}} \leftarrow {\frac {1}{M}}\mathbf {X} g(\mathbf {w_{p}} ^{T}\mathbf {X} )^{T}-{\frac {1}{M}}g'(\mathbf {w_{p}} ^{T}\mathbf {X} )\mathbf {1} \mathbf {w_{p}} }$
${\displaystyle \mathbf {w_{p}} \leftarrow \mathbf {w_{p}} -\sum _{j=1}^{p-1}\mathbf {w_{p}} ^{T}\mathbf {w_{j}} \mathbf {w_{j}} }$
${\displaystyle \mathbf {w_{p}} \leftarrow {\frac {\mathbf {w_{p}} }{\|\mathbf {w_{p}} \|}}}$


 Output: ${\displaystyle \mathbf {W} ={\begin{bmatrix}\mathbf {w_{1}} ,\dots ,\mathbf {w_{C}} \end{bmatrix}}}$
Output: ${\displaystyle \mathbf {S} =\mathbf {W^{T}} \mathbf {X} }$