# Cramér's V

(Redirected from Cramér's V (statistics))

In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φc) is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and was published by Harald Cramér in 1946.[1]

## Usage and interpretation

φc is the intercorrelation of two discrete variables[2] and may be used with variables having two or more levels. φc is a symmetrical measure, it does not matter which variable we place in the columns and which in the rows. Also, the order of rows/columns doesn't matter, so φc may be used with nominal data types or higher (notably ordered or numerical).

Cramér's V may also be applied to goodness of fit chi-squared models when there is a 1 × k table (in this case r = 1). In this case k is taken as the number of optional outcomes and it functions as a measure of tendency towards a single outcome.[citation needed]

Cramér's V varies from 0 (corresponding to no association between the variables) to 1 (complete association) and can reach 1 only when the two variables are equal to each other.

φc2 is the mean square canonical correlation between the variables.[citation needed]

In the case of a 2 × 2 contingency table Cramér's V is equal to the Phi coefficient.

Note that as chi-squared values tend to increase with the number of cells, the greater the difference between r (rows) and c (columns), the more likely φc will tend to 1 without strong evidence of a meaningful correlation.[citation needed]

V may be viewed as the association between two variables as a percentage of their maximum possible variation. V2 is the mean square canonical correlation between the variables.[citation needed]

## Calculation

Let a sample of size n of the simultaneously distributed variables ${\displaystyle A}$ and ${\displaystyle B}$ for ${\displaystyle i=1,\ldots ,r;j=1,\ldots ,k}$ be given by the frequencies

${\displaystyle n_{ij}=}$ number of times the values ${\displaystyle (A_{i},B_{j})}$ were observed.

The chi-squared statistic then is:

${\displaystyle \chi ^{2}=\sum _{i,j}{\frac {(n_{ij}-{\frac {n_{i.}n_{.j}}{n}})^{2}}{\frac {n_{i.}n_{.j}}{n}}}}$

Cramér's V is computed by taking the square root of the chi-squared statistic divided by the sample size and the minimum dimension minus 1:

${\displaystyle V={\sqrt {\frac {\varphi ^{2}}{\min(k-1,r-1)}}}={\sqrt {\frac {\chi ^{2}/n}{\min(k-1,r-1)}}}}$

where:

• ${\displaystyle \varphi }$ is the phi coefficient.
• ${\displaystyle \chi ^{2}}$ is derived from Pearson's chi-squared test
• ${\displaystyle n}$ is the grand total of observations and
• ${\displaystyle k}$ being the number of columns.
• ${\displaystyle r}$ being the number of rows.

The p-value for the significance of V is the same one that is calculated using the Pearson's chi-squared test.[citation needed]

The formula for the variance of Vc is known.[3]

In R, the function cramersV() from the lsr package, calculates V using the chisq.test function from the stats package.[4]

## Bias correction

Cramér's V can be a heavily biased estimator of its population counterpart and will tend to overestimate the strength of association. A bias correction, using the above notation, is given by[5]

${\displaystyle {\tilde {V}}={\sqrt {\frac {{\tilde {\varphi }}^{2}}{\min({\tilde {k}}-1,{\tilde {r}}-1)}}}}$

where

${\displaystyle {\tilde {\varphi }}^{2}=\max \left(0,\varphi ^{2}-{\frac {(k-1)(r-1)}{n-1}}\right)}$

and

${\displaystyle {\tilde {k}}=k-{\frac {(k-1)^{2}}{n-1}}}$
${\displaystyle {\tilde {r}}=r-{\frac {(r-1)^{2}}{n-1}}}$

Then ${\displaystyle {\tilde {V}}}$ estimates the same population quantity as Cramér's V but with typically much smaller mean squared error. The rationale for the correction is that under independence, ${\displaystyle E[\varphi ^{2}]={\frac {(k-1)(r-1)}{n-1}}}$.[6]