EigenMoments

Signal space is transformed into moment space, i.e. Geometric Moments, then it is transformed into noise space in which axes with lowest rate of noise are retained and finally transformed into feature space

EigenMoments[1] is a set of orthogonal, noise robust, invariant to rotation, scaling and translation and distribution sensitive moments. Their application can be found in signal processing and computer vision as descriptors of the signal or image. The descriptors can later be used for classification purposes.

It is obtained by performing orthogonalization, via eigen analysis on geometric moments.[2]

Framework summary

EigenMoments are computed by performing eigen analysis on the moment space of an image by maximizing Signal to Noise Ratio in the feature space in form of Rayleigh quotient.

This approach has several benefits in Image processing applications:

1. Dependency of moments in the moment space on the distribution of the images being transformed, ensures decorrelation of the final feature space after eigen analysis on the moment space.
2. The ability of EigenMoments to take into account distribution of the image makes it more versatile and adaptable for different genres.
3. Generated moment kernels are orthogonal and therefore analysis on the moment space becomes easier. Transformation with orthogonal moment kernels into moment space is analogous to projection of the image onto a number of orthogonal axes.
4. Nosiy components can be removed. This makes EigenMoments robust for classification applications.
5. Optimal information compaction can be obtained and therefore a few number of moments are needed to characterize the images.

Problem formulation

Assume that a signal vector ${\displaystyle s\in {\mathcal {R}}^{n}}$ is taken from a certain distribution having coorelation ${\displaystyle C\in {\mathcal {R}}^{n\times n}}$,i.e. ${\displaystyle C=E[ss^{T}]}$ where E[.] denotes expected value.

Dimension of signal space, n, is often too large to be useful for practical application such as pattern classification, we need to transform the signal space into a space with lower dimensionality.

This is performed by a two-step linear transformation:

${\displaystyle q=W^{T}X^{T}s,}$

where ${\displaystyle q=[q_{1},...,q_{n}]^{T}\in {\mathcal {R}}^{k}}$ is the transformed signal, ${\displaystyle X=[x_{1},...,x_{n}]^{T}\in {\mathcal {R}}^{n\times m}}$ a fixed transformation matrix which transforms the signal into the moment space, and ${\displaystyle W=[w_{1},...,w_{n}]^{T}\in {\mathcal {R}}^{m\times k}}$ the transformation matrix which we are going to determine by maximizing the SNR of the feature space resided by ${\displaystyle q}$. For the case of Geometric Moments, X would be the monomials. If ${\displaystyle m=k=n}$, a full rank transformation would result, however usually we have ${\displaystyle m\leq n}$ and ${\displaystyle k\leq m}$. This is specially the case when ${\displaystyle n}$ is of high dimensions.

Finding ${\displaystyle W}$ that maximizes the SNR of the feature space:

${\displaystyle SNR_{transform}={\frac {w^{T}X^{T}CXw}{w^{T}X^{T}NXw}},}$

where N is the correlation matrix of the noise signal. The problem can thus be formulated as

${\displaystyle {w_{1},...,w_{k}}=argmax_{w}{\frac {w^{T}X^{T}CXw}{w^{T}X^{T}NXw}}}$

subject to constraints:

${\displaystyle w_{i}^{T}X^{T}NXw_{j}=\delta _{ij},}$ where ${\displaystyle \delta _{ij}}$ is the Kronecker delta.

It can be observed that this maximization is Rayleigh quotient by letting ${\displaystyle A=X^{T}CX}$ and ${\displaystyle B=X^{T}NX}$ and therefore can be written as:

${\displaystyle {w_{1},...,w_{k}}={\underset {x}{\operatorname {arg\,max} }}{\frac {w^{T}Aw}{w^{T}Bw}}}$, ${\displaystyle w_{i}^{T}Bw_{j}=\delta _{ij}}$

Rayleigh quotient

Optimization of Rayleigh quotient[3][4] has the form:

${\displaystyle \max _{w}R(w)=\max _{w}{\frac {w^{T}Aw}{w^{T}Bw}}}$

and ${\displaystyle A}$ and ${\displaystyle B}$, both are symmetric and ${\displaystyle B}$ is positive definite and therefore invertible. Scaling ${\displaystyle w}$ does not change the value of the object function and hence and additional scalar constraint ${\displaystyle w^{T}Bw=1}$ can be imposed on ${\displaystyle w}$ and no solution would be lost when the objective function is optimized.

This constraint optimization problem can be solved using Lagrangian multiplier:

${\displaystyle \max _{w}{w^{T}Aw}}$ subject to ${\displaystyle {w^{T}Bw}=1}$

${\displaystyle \max _{w}{\mathcal {L}}(w)=\max _{w}(w{T}Aw-\lambda w^{T}Bw)}$

equating first derivative to zero and we will have:

${\displaystyle Aw=\lambda Bw}$

which is an instance of Generalized Eigenvalue Problem (GEP). The GEP has the form:

${\displaystyle Aw=\lambda Bw}$

for any pair ${\displaystyle (w,\lambda )}$ that is a solution to above equation, ${\displaystyle w}$ is called a generalized eigenvector and ${\displaystyle \lambda }$ is called a generalized eigenvalue.

Finding ${\displaystyle w}$ and ${\displaystyle \lambda }$ that satisfies this equations would produce the result which optimizes Rayleigh quotient.

One way of maximizing Rayleigh quotient is through solving the Generalized Eigen Problem. Dimension reduction can be performed by simply choosing the first components ${\displaystyle w_{i}}$, ${\displaystyle i=1,...,k}$, with the highest values for ${\displaystyle R(w)}$ out of the ${\displaystyle m}$ components, and discard the rest. Interpretation of this transformation is rotating and scaling the moment space, transforming it into a feature space with maximized SNR and therefore, the first ${\displaystyle k}$ components are the components with highest ${\displaystyle k}$ SNR values.

The other method to look at this solution is to use the concept of simultaneous diagonalization instead of Generalized Eigen Problem.

Simultaneous diagonalization

1. Let ${\displaystyle A=X^{T}CX}$ and ${\displaystyle B=X^{T}NX}$ as mentioned earlier. We can write ${\displaystyle W}$ as two separate transformation matrices:

${\displaystyle W=W_{1}W_{2}.}$

1. ${\displaystyle W_{1}}$ can be found by first diagonalize B:

${\displaystyle P^{T}BP=D_{B}}$.

Where ${\displaystyle D_{B}}$ is a diagonal matrix sorted in increasing order. Since ${\displaystyle B}$ is positive definite, thus ${\displaystyle D_{B}>0}$. We can discard those eigenvalues that large and retain those close to 0, since this means the energy of the noise is close to 0 in this space, at this stage it is also possible to discard those eigenvectors that have large eigenvalues.

Let ${\displaystyle {\hat {P}}}$ be the first ${\displaystyle k}$ columns of ${\displaystyle P}$, now ${\displaystyle {\hat {P^{T}}}B{\hat {P}}={\hat {D_{B}}}}$ where ${\displaystyle {\hat {D_{B}}}}$ is the ${\displaystyle k\times k}$ principal submatrix of ${\displaystyle D_{B}}$.

1. Let

${\displaystyle W_{1}={\hat {P}}{\hat {D_{B}}}^{-1/2}}$

and hence:

${\displaystyle W_{1}^{T}BW_{1}=({\hat {P}}{\hat {D_{B}}}^{-1/2})^{T}B({\hat {P}}{\hat {D_{B}}}^{-1/2})=I}$.

${\displaystyle W_{1}}$ whiten ${\displaystyle B}$ and reduces the dimensionality from ${\displaystyle m}$ to ${\displaystyle k}$. The transformed space resided by ${\displaystyle q'=W_{1}^{T}X^{T}s}$ is called the noise space.

1. Then, we diagonalize ${\displaystyle W_{1}^{T}AW_{1}}$:

${\displaystyle W_{2}^{T}W_{1}^{T}AW_{1}W_{2}=D_{A}}$,

where ${\displaystyle W_{2}^{T}W_{2}=I}$. ${\displaystyle D_{A}}$ is the matrix with eigenvalues of ${\displaystyle W_{1}^{T}AW_{1}}$ on its diagonal. We may retain all the eigenvalues and their corresponding eigenvectors since the most of the noise are already discarded in previous step.

1. Finally the transformation is given by:

${\displaystyle W=W_{1}W_{2}}$

where ${\displaystyle W}$ diagonalizes both the numerator and denominator of the SNR,

${\displaystyle W^{T}AW=D_{A}}$, ${\displaystyle W^{T}BW=I}$ and the transformation of signal ${\displaystyle s}$ is defined as ${\displaystyle q=W^{T}X^{T}s=W_{2}^{T}W_{1}^{T}X^{T}s}$.

Information loss

To find the information loss when we discard some of the eigenvalues and eigenvectors we can perform following analysis:

${\displaystyle {\begin{array}{lll}\eta &=&1-{\frac {trace(W_{1}^{T}AW_{1})}{trace(D_{B}^{-1/2}P^{T}APD_{B}^{-1/2})}}\\&=&1-{\frac {trace({\hat {D_{B}}}^{-1/2}{\hat {P}}^{T}A{\hat {P}}{\hat {D_{B}}}^{-1/2})}{trace(D_{B}^{-1/2}P^{T}APD_{B}^{-1/2})}}\end{array}}}$

Eigenmoments

Eigenmoments are derived by applying the above framework on Geometric Moments. They can be derived for both 1D and 2D signals.

1D signal

If we let ${\displaystyle X=[1,x,x^{2},...,x^{m-1}]}$, i.e. the monomials, after the transformation ${\displaystyle X^{T}}$ we obtain Geometric Moments, denoted by vector ${\displaystyle M}$, of signal ${\displaystyle s=[s(x)]}$,i.e. ${\displaystyle M=X^{T}s}$.

In practice it is difficult to estimate the correlation signal due to insufficient number of samples, therefore parametric approaches are utilized.

One such model can be defined as:

${\displaystyle r(x_{1},x_{2})=r(0,0)e^{-c(x_{1}-x_{2})^{2}}}$,

Plot of the parametric model which predicts correlations in the input signal. ${\displaystyle r(x_{1},x_{2})=r(0,0)e^{-c(x_{1}-x_{2})^{2}}}$

where ${\displaystyle r(0,0)=E[tr(ss^{T})]}$. This model of correlation can be replaced by other models however this model covers general natural images.

Since ${\displaystyle r(0,0)}$ does not affect the maximization it can be dropped.

${\displaystyle A=X^{T}CX=\int _{-1}^{1}\int _{-1}^{1}[x_{1}^{j}x_{2}^{i}e^{-c(x_{1}-x_{2})^{2}}]_{i,j=0}^{i,j=m-1}dx_{1}dx_{2}}$

The correlation of noise can be modelled as ${\displaystyle \sigma _{n}^{2}\delta (x_{1},x_{2})}$, where ${\displaystyle \sigma _{n}^{2}}$ is the energy of noise.Again ${\displaystyle \sigma _{n}^{2}}$ can be dropped because the constant does not have any effect on the maximization problem.

${\displaystyle B=X^{T}NX=\int _{-1}^{1}\int _{-1}^{1}[x_{1}^{j}x_{2}^{i}\delta (x_{1},x_{2})]_{i,j=0}^{i,j=m-1}dx_{1}dx_{2}}$ ${\displaystyle B=X^{T}NX=\int _{-1}^{1}[x_{1}^{j+i}]_{i,j=0}^{i,j=m-1}dx_{1}=X^{T}X}$

Using the computed A and B and applying the algorithm discussed in previous section we find ${\displaystyle W}$ and set of transformed monomials ${\displaystyle \Phi =[\phi _{1},...,\phi _{k}]=XW}$ which produces the moment kernels of EM. The moment kernels of EM decorrelate the correlation in the image.

${\displaystyle \Phi ^{T}C\Phi =(XW)^{T}C(XW)=D_{C}}$,

and are orthogonal:

${\displaystyle {\begin{array}{lll}\Phi ^{T}\Phi &=&(XW)^{T}(XW)\\&=&W^{T}X^{T}X\\&=&W^{T}X^{T}NXW\\&=&W^{T}BW\\&=&I\\\end{array}}}$

Example computation

Taking ${\displaystyle c=0.5}$, the dimension of moment space as ${\displaystyle m=6}$ and the dimension of feature space as ${\displaystyle k=4}$, we will have:

${\displaystyle W=\left({\begin{array}{cccc}0.0&0&-0.7745&-0.8960\\2.8669&-4.4622&0.0&0.0\\0.0&0.0&7.9272&2.4523\\-4.0225&20.6505&0.0&0.0\\0.0&0.0&-9.2789&-0.1239\\-0.5092&-18.4582&0.0&0.0\end{array}}\right)}$

and

${\displaystyle {\begin{array}{lll}\phi _{1}&=&2.8669x-4.0225x^{3}-0.5092x^{5}\\\phi _{2}&=&-4.4622x+20.6505x^{3}-18.4582x^{5}\\\phi _{3}&=&-0.7745+7.9272x^{2}-9.2789x^{4}\\\phi _{4}&=&-0.8960+2.4523x^{2}-0.1239x^{4}\\\end{array}}}$

2D signal

The derivation for 2D signal is the same as 1D signal except that conventional Geometric Moments are directly employed to obtain the set of 2D EigenMoments.

The definition of Geometric Moments of order ${\displaystyle (p+q)}$ for 2D image signal is:

${\displaystyle m_{pq}=\int _{-1}^{1}\int _{-1}^{1}x^{p}y^{q}f(x,y)dxdy}$.

which can be denoted as ${\displaystyle M=\{m_{j,i}\}_{i,j=0}^{i,j=m-1}}$. Then the set of 2D EigenMoments are:

${\displaystyle \Omega =W^{T}MW}$,

where ${\displaystyle \Omega =\{\Omega _{j,i}\}_{i,j=0}^{i,j=k-1}}$ is a matrix that contains the set of EigenMoments.

${\displaystyle \Omega _{j,i}=\Sigma _{r=0}^{m-1}\Sigma _{s=0}^{m-1}w_{r,j}w_{s,i}m_{r,s}}$.

EigenMoment invariants (EMI)

In order to obtain a set of moment invariants we can use normalized Geometric Moments ${\displaystyle {\hat {M}}}$ instead of ${\displaystyle M}$.

Normalized Geometric Moments are invariant to Rotation,Scaling and Transformation and defined by:

${\displaystyle {\begin{array}{lll}{\hat {m}}_{pq}&=&\alpha ^{p}+q+2\int _{-1}^{1}\int _{-1}^{1}[(x-x^{c})cos(\theta )+(y-y^{c})sin(\theta )]^{p}\\&=&\times [-(x-x^{c})sin(\theta )+(y-y^{c})cos(\theta )]^{q}\\&=&\times f(x,y)dxdy,\\\end{array}}}$

where:${\displaystyle (x^{c},y^{c})=(m_{10}/m_{00},m_{01}/m_{00})}$ is the centroid of the image ${\displaystyle f(x,y)}$ and

${\displaystyle {\begin{array}{lll}\alpha &=&[m_{00}^{S}/m_{00}]^{1/2}\\\theta &=&{\frac {1}{2}}tan^{-1}{\frac {2m_{11}}{m_{20}-m_{02}}}\end{array}}}$.

${\displaystyle m_{00}^{S}}$ in this equation is a scaling factor depending on the image. ${\displaystyle m_{00}^{S}}$ is usually set to 1 for binary images.