# Biweight midcorrelation

In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information.[1]

## Derivation

Here we find the biweight midcorrelation of two vectors ${\displaystyle x}$ and ${\displaystyle y}$, with ${\displaystyle i=1,2,\ldots ,m}$ items, representing each item in the vector as ${\displaystyle x_{1},x_{2},\ldots ,x_{m}}$ and ${\displaystyle y_{1},y_{2},\ldots ,y_{m}}$. First, we define ${\displaystyle \operatorname {med} (x)}$ as the median of a vector ${\displaystyle x}$ and ${\displaystyle \operatorname {mad} (x)}$ as the median absolute deviation (MAD), then define ${\displaystyle u_{i}}$ and ${\displaystyle v_{i}}$ as,

{\displaystyle {\begin{aligned}u_{i}&={\frac {x_{i}-\operatorname {med} (x)}{9\operatorname {mad} (x)}},\\v_{i}&={\frac {y_{i}-\operatorname {med} (y)}{9\operatorname {mad} (y)}}.\end{aligned}}}

Now we define the weights ${\displaystyle w_{i}^{(x)}}$ and ${\displaystyle w_{i}^{(y)}}$ as,

{\displaystyle {\begin{aligned}w_{i}^{(x)}&=\left(1-u_{i}^{2}\right)^{2}I\left(1-|u_{i}|\right)\\w_{i}^{(y)}&=\left(1-v_{i}^{2}\right)^{2}I\left(1-|v_{i}|\right)\end{aligned}}}

where ${\displaystyle I}$ is the identity function where,

${\displaystyle I(x)={\begin{cases}1,&{\text{if }}x>0\\0,&{\text{otherwise}}\end{cases}}}$

Then we normalize so that the sum of the weights is 1:

{\displaystyle {\begin{aligned}{\tilde {x}}_{i}&={\frac {\left(x_{i}-\operatorname {med} (x)\right)w_{i}^{(x)}}{\sqrt {\sum _{j=1}^{m}\left[(x_{j}-\operatorname {med} (x))w_{j}^{(x)}\right]^{2}}}}\\{\tilde {y}}_{i}&={\frac {\left(y_{i}-\operatorname {med} (y)\right)w_{i}^{(y)}}{\sqrt {\sum _{j=1}^{m}\left[(y_{j}-\operatorname {med} (y))w_{j}^{(y)}\right]^{2}}}}.\end{aligned}}}

Finally, we define biweight midcorrelation as,

${\displaystyle \mathrm {bicor} \left(x,y\right)=\sum _{i=1}^{m}{\tilde {x}}_{i}{\tilde {y}}_{i}}$

## Applications

Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks,[2] and is often used for weighted correlation network analysis.

## Implementations

Biweight midcorrelation has been implemented in the R statistical programming language as the function bicor as part of the WGCNA package[3]

## References

1. ^ Wilcox, Rand (January 12, 2012). Introduction to Robust Estimation and Hypothesis Testing (3rd ed.). Academic Press. p. 455. ISBN 978-0123869838.
2. ^ Song, Lin (9 December 2012). "Comparison of co-expression measures: mutual information, correlation, and model based indices". BMC Bioinformatics. 13 (328). doi:10.1186/1471-2105-13-328. PMC 3586947. PMID 23217028.
3. ^ Langfelder, Peter. "WGCNA: Weighted Correlation Network Analysis (an R package)". CRAN. Retrieved 2018-04-06.