# Control variates

The control variates method is a variance reduction technique used in Monte Carlo methods. It exploits information about the errors in estimates of known quantities to reduce the error of an estimate of an unknown quantity. 

## Underlying principle

Let the unknown parameter of interest be $\mu$ , and assume we have a statistic $m$ such that the expected value of m is μ: $\mathbb {E} \left[m\right]=\mu$ , i.e. m is an unbiased estimator for μ. Suppose we calculate another statistic $t$ such that $\mathbb {E} \left[t\right]=\tau$ is a known value. Then

$m^{\star }=m+c\left(t-\tau \right)\,$ is also an unbiased estimator for $\mu$ for any choice of the coefficient $c$ . The variance of the resulting estimator $m^{\star }$ is

${\textrm {Var}}\left(m^{\star }\right)={\textrm {Var}}\left(m\right)+c^{2}\,{\textrm {Var}}\left(t\right)+2c\,{\textrm {Cov}}\left(m,t\right).$ By differentiating the above expression with respect to $c$ , it can be shown that choosing the optimal coefficient

$c^{\star }=-{\frac {{\textrm {Cov}}\left(m,t\right)}{{\textrm {Var}}\left(t\right)}}$ minimizes the variance of $m^{\star }$ , and that with this choice,

{\begin{aligned}{\textrm {Var}}\left(m^{\star }\right)&={\textrm {Var}}\left(m\right)-{\frac {\left[{\textrm {Cov}}\left(m,t\right)\right]^{2}}{{\textrm {Var}}\left(t\right)}}\\&=\left(1-\rho _{m,t}^{2}\right){\textrm {Var}}\left(m\right)\end{aligned}} where

$\rho _{m,t}={\textrm {Corr}}\left(m,t\right)\,$ is the correlation coefficient of $m$ and $t$ . The greater the value of $\vert \rho _{m,t}\vert$ , the greater the variance reduction achieved.

In the case that ${\textrm {Cov}}\left(m,t\right)$ , ${\textrm {Var}}\left(t\right)$ , and/or $\rho _{m,t}\;$ are unknown, they can be estimated across the Monte Carlo replicates. This is equivalent to solving a certain least squares system; therefore this technique is also known as regression sampling.

When the expectation of the control variable, $\mathbb {E} \left[t\right]=\tau$ , is not known analytically, it is still possible to increase the precision in estimating $\mu$ (for a given fixed simulation budget), provided that the two conditions are met: 1) evaluating $t$ is significantly cheaper than computing $m$ ; 2) the magnitude of the correlation coefficient $|\rho _{m,t}|$ is close to unity. 

## Example

We would like to estimate

$I=\int _{0}^{1}{\frac {1}{1+x}}\,\mathrm {d} x$ using Monte Carlo integration. This integral is the expected value of $f(U)$ , where

$f(U)={\frac {1}{1+U}}$ and U follows a uniform distribution [0, 1]. Using a sample of size n denote the points in the sample as $u_{1},\cdots ,u_{n}$ . Then the estimate is given by

$I\approx {\frac {1}{n}}\sum _{i}f(u_{i}).$ Now we introduce $g(U)=1+U$ as a control variate with a known expected value $\mathbb {E} \left[g\left(U\right)\right]=\int _{0}^{1}(1+x)\,\mathrm {d} x={\tfrac {3}{2}}$ and combine the two into a new estimate

$I\approx {\frac {1}{n}}\sum _{i}f(u_{i})+c\left({\frac {1}{n}}\sum _{i}g(u_{i})-3/2\right).$ Using $n=1500$ realizations and an estimated optimal coefficient $c^{\star }\approx 0.4773$ we obtain the following results

 Estimate Variance Classical estimate 0.69475 0.01947 Control variates 0.69295 0.00060

The variance was significantly reduced after using the control variates technique. (The exact result is $I=\ln 2\approx 0.69314718$ .)