# Welch's t test

In statistics, Welch's t-test (or unequal variances t-test) is a two-sample location test, and is used to test the hypothesis that two populations have equal means. Welch's t-test is an adaptation of Student's t-test,[1] and is more reliable when the two samples have unequal variances and unequal sample sizes.[2] These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping. Given that Welch's t-test has been less popular than Student's t-test[2] and may be less familiar to readers, a more informative name is "Welch's unequal variances t-test" or "unequal variances t-test" for brevity.

## Assumptions

Student's t-test assumes that the two populations have normal distributions and with equal variances. Welch's t-test is designed for unequal variances, but the assumption of normality is maintained.[1] Welch's t-test is an approximate solution to the Behrens-Fisher problem.

## Calculations

Welch's t-test defines the statistic t by the following formula:

$t \quad = \quad {\; \overline{X}_1 - \overline{X}_2 \; \over \sqrt{ \; {s_1^2 \over N_1} \; + \; {s_2^2 \over N_2} \quad }}\,$

where $\overline{X}_{1}$, $s_{1}^{2}$ and $N_{1}$ are the $1$st sample mean, sample variance and sample size, respectively. Unlike in Student's t-test, the denominator is not based on a pooled variance estimate.

The degrees of freedom $\nu$  associated with this variance estimate is approximated using the Welch–Satterthwaite equation:

$\nu \quad \approx \quad {{\left( \; {s_1^2 \over N_1} \; + \; {s_2^2 \over N_2} \; \right)^2 } \over { \quad {s_1^4 \over N_1^2 \nu_1} \; + \; {s_2^4 \over N_2^2 \nu_2 } \quad }}$

Here $\nu_1$ = $N_1-1$, the degrees of freedom associated with the $1$st variance estimate. $\nu_2$ = $N_2-1$, the degrees of freedom associated with the $2$st variance estimate.

Welch's t-test can also be calculated for ranked data and might then be named Welch's U-test.[3]

## Statistical test

Once t and $\nu$ have been computed, these statistics can be used with the t-distribution to test the null hypothesis that the two population means are equal (using a two-tailed test), or the alternative hypothesis that one of the population means is greater than or equal to the other (using a one-tailed test). The approximate degrees of freedom is rounded down to the nearest integer.

Welch's t-test is more robust than Student's t-test and maintains type I error rates close to nominal for unequal variances and for unequal sample sizes. Furthermore, the power of Welch's t-test comes close to that of Student’s t-test, even when the population variances are equal and sample sizes are balanced.[2]

It is not recommended to pre-test for equal variances and then choose between Student's t-test or Welch's t-test.[4] Rather, Welch's t-test can be applied directly and without any substantial disadvantages to Student's t-test as noted above. Welch's t-test remains robust for skewed distributions and large sample sizes.[5] Reliability decreases for skewed distributions and smaller samples, where one could possibly perform Welch’s t-test on ranked data.[3]

## Examples

The following three examples compare Welch's t-test and Student's t-test. Samples are from random normal distributions using the R programming language.

For all three examples, the population means were $\mu_{1}$ = 20 and $\mu_{2}$ = 22.

The first example is for equal variances ($\sigma_{1}^2$ = $\sigma_{2}^2$ = 4) and equal sample sizes ($N_{1}$ = $N_{2}$ = 15). Let A1 and A2 denote two random samples:

$A1 = {27.5, 21.0, 19.0, 23.6, 17.0, 17.9, 16.9, 20.1, 21.9, 22.6, 23.1, 19.6, 19.0, 21.7, 21.4}$

$A2 = {27.1, 22.0, 20.8, 23.4, 23.4, 23.5, 25.8, 22.0, 24.8, 20.2, 21.9, 22.1, 22.9, 20.5, 24.4}$

The second example is for unequal variances ($\sigma_{1}^2$ = 16, $\sigma_{2}^2$ = 1) and unequal sample sizes ($N_{1}$ = 10, $N_{2}$ = 20). The smaller sample has the larger variance:

$A1 = {17.2, 20.9, 22.6, 18.1, 21.7, 21.4, 23.5, 24.2, 14.7, 21.8}$

$A2 = {21.5, 22.8, 21.0, 23.0, 21.6, 23.6, 22.5, 20.7, 23.4, 21.8, 20.7, 21.7, 21.5, 22.5, 23.6, 21.5, 22.5, 23.5, 21.5, 21.8}$

The third example is for unequal variances ($\sigma_{1}^2$ = 1, $\sigma_{2}^2$ = 16) and unequal sample sizes ($N_{1}$ = 10, $N_{2}$ = 20). The larger sample has the larger variance:

$A1 = {19.8, 20.4, 19.6, 17.8, 18.5, 18.9, 18.3, 18.9, 19.5, 22.0}$

$A2 = {28.2, 26.6, 20.1, 23.3, 25.2, 22.1, 17.7, 27.6, 20.6, 13.7, 23.2, 17.5, 20.6, 18.0, 23.9, 21.6, 24.3, 20.4, 24.0, 13.2}$

Reference P-values were obtained by simulating the distributions of the t statistics for the null hypothesis of equal population means ($\mu_{1} - \mu_{2}$ = 0). Results are summarised in the table below, with two-tailed P-values:

 Sample A1 Sample A2 Student's t-test Welch's t-test Example $N_{1}$ $\overline{X}_{1}$ $s_{1}^{2}$ $N_{2}$ $\overline{X}_{2}$ $s_{2}^{2}$ $t$ $\nu$ $P$ $P_{sim}$ $t$ $\nu$ $P$ $P_{sim}$ 1 15 20.8 7.9 15 23.0 3.8 -2.46 28 0.021 0.021 -2.46 25.0 0.021 0.017 2 10 20.6 9.0 20 22.1 0.9 -2.10 28 0.045 0.150 -1.57 9.9 0.149 0.144 3 10 19.4 1.4 20 21.6 17.1 -1.64 28 0.110 0.036 -2.22 24.5 0.036 0.042

Welch's t-test and Student's t-test gave practically identical results for the two samples with equal variances and equal sample sizes (Example 1). For unequal variances, Student's t-test gave a low P-value when the smaller sample had a larger variance (Example 2) and a high P-value when the larger sample had a larger variance (Example 3). For unequal variances, Welch's t-test gave P-values close to simulated P-values.

## Software implementations

Language/Program Function Notes
LibreOffice TTEST(Data1; Data2; Mode; Type) See [1]
MATLAB ttest2(data1, data2, 'Vartype', 'unequal') See [2]
Microsoft Excel pre 2010 TTEST(array1, array2, tails, type) See [3]
Microsoft Excel 2010 and later T.TEST(array1, array2, tails, type) See [4]
Python scipy.stats.ttest_ind(a, b, axis=0, equal_var=False) See [5]
R t.test(data1, data2, alternative="two.sided", var.equal=FALSE) See [6]