# Biweight midcorrelation

Jump to navigation Jump to search

In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information.

## Derivation

Here we find the biweight midcorrelation of two vectors $x$ and $y$ , with $i=1,2,\ldots ,m$ items, representing each item in the vector as $x_{1},x_{2},\ldots ,x_{m}$ and $y_{1},y_{2},\ldots ,y_{m}$ . First, we define $\operatorname {med} (x)$ as the median of a vector $x$ and $\operatorname {mad} (x)$ as the median absolute deviation (MAD), then define $u_{i}$ and $v_{i}$ as,

{\begin{aligned}u_{i}&={\frac {x_{i}-\operatorname {med} (x)}{9\operatorname {mad} (x)}},\\v_{i}&={\frac {y_{i}-\operatorname {med} (y)}{9\operatorname {mad} (y)}}.\end{aligned}} Now we define the weights $w_{i}^{(x)}$ and $w_{i}^{(y)}$ as,

{\begin{aligned}w_{i}^{(x)}&=\left(1-u_{i}^{2}\right)^{2}I\left(1-|u_{i}|\right)\\w_{i}^{(y)}&=\left(1-v_{i}^{2}\right)^{2}I\left(1-|v_{i}|\right)\end{aligned}} where $I$ is the identity function where,

$I(x)={\begin{cases}1,&{\text{if }}x>0\\0,&{\text{otherwise}}\end{cases}}$ Then we normalize so that the sum of the weights is 1:

{\begin{aligned}{\tilde {x}}_{i}&={\frac {\left(x_{i}-\operatorname {med} (x)\right)w_{i}^{(x)}}{\sqrt {\sum _{j=1}^{m}\left[(x_{j}-\operatorname {med} (x))w_{j}^{(x)}\right]^{2}}}}\\{\tilde {y}}_{i}&={\frac {\left(y_{i}-\operatorname {med} (y)\right)w_{i}^{(y)}}{\sqrt {\sum _{j=1}^{m}\left[(y_{j}-\operatorname {med} (y))w_{j}^{(y)}\right]^{2}}}}.\end{aligned}} Finally, we define biweight midcorrelation as,

$\mathrm {bicor} \left(x,y\right)=\sum _{i=1}^{m}{\tilde {x}}_{i}{\tilde {y}}_{i}$ ## Applications

Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks, and is often used for weighted correlation network analysis.

## Implementations

Biweight midcorrelation has been implemented in the R statistical programming language as the function bicor as part of the WGCNA package