# Control variates

The control variates method is a variance reduction technique used in Monte Carlo methods. It exploits information about the errors in estimates of known quantities to reduce the error of an estimate of an unknown quantity.[1] [2][3]

## Underlying principle

Let the unknown parameter of interest be ${\displaystyle \mu }$, and assume we have a statistic ${\displaystyle m}$ such that the expected value of m is μ: ${\displaystyle \mathbb {E} \left[m\right]=\mu }$, i.e. m is an unbiased estimator for μ. Suppose we calculate another statistic ${\displaystyle t}$ such that ${\displaystyle \mathbb {E} \left[t\right]=\tau }$ is a known value. Then

${\displaystyle m^{\star }=m+c\left(t-\tau \right)\,}$

is also an unbiased estimator for ${\displaystyle \mu }$ for any choice of the coefficient ${\displaystyle c}$. The variance of the resulting estimator ${\displaystyle m^{\star }}$ is

${\displaystyle {\textrm {Var}}\left(m^{\star }\right)={\textrm {Var}}\left(m\right)+c^{2}\,{\textrm {Var}}\left(t\right)+2c\,{\textrm {Cov}}\left(m,t\right).}$

By differentiating the above expression with respect to ${\displaystyle c}$, it can be shown that choosing the optimal coefficient

${\displaystyle c^{\star }=-{\frac {{\textrm {Cov}}\left(m,t\right)}{{\textrm {Var}}\left(t\right)}}}$

minimizes the variance of ${\displaystyle m^{\star }}$. (Note that this coefficient is the same as the coefficient obtained from a linear regression.) With this choice,

{\displaystyle {\begin{aligned}{\textrm {Var}}\left(m^{\star }\right)&={\textrm {Var}}\left(m\right)-{\frac {\left[{\textrm {Cov}}\left(m,t\right)\right]^{2}}{{\textrm {Var}}\left(t\right)}}\\&=\left(1-\rho _{m,t}^{2}\right){\textrm {Var}}\left(m\right)\end{aligned}}}

where

${\displaystyle \rho _{m,t}={\textrm {Corr}}\left(m,t\right)\,}$

is the correlation coefficient of ${\displaystyle m}$ and ${\displaystyle t}$. The greater the value of ${\displaystyle \vert \rho _{m,t}\vert }$, the greater the variance reduction achieved.

In the case that ${\displaystyle {\textrm {Cov}}\left(m,t\right)}$, ${\displaystyle {\textrm {Var}}\left(t\right)}$, and/or ${\displaystyle \rho _{m,t}\;}$ are unknown, they can be estimated across the Monte Carlo replicates. This is equivalent to solving a certain least squares system; therefore this technique is also known as regression sampling.

When the expectation of the control variable, ${\displaystyle \mathbb {E} \left[t\right]=\tau }$, is not known analytically, it is still possible to increase the precision in estimating ${\displaystyle \mu }$ (for a given fixed simulation budget), provided that the two conditions are met: 1) evaluating ${\displaystyle t}$ is significantly cheaper than computing ${\displaystyle m}$; 2) the magnitude of the correlation coefficient ${\displaystyle |\rho _{m,t}|}$ is close to unity. [3]

## Example

We would like to estimate

${\displaystyle I=\int _{0}^{1}{\frac {1}{1+x}}\,\mathrm {d} x}$

using Monte Carlo integration. This integral is the expected value of ${\displaystyle f(U)}$, where

${\displaystyle f(U)={\frac {1}{1+U}}}$

and U follows a uniform distribution [0, 1]. Using a sample of size n denote the points in the sample as ${\displaystyle u_{1},\cdots ,u_{n}}$. Then the estimate is given by

${\displaystyle I\approx {\frac {1}{n}}\sum _{i}f(u_{i}).}$

Now we introduce ${\displaystyle g(U)=1+U}$ as a control variate with a known expected value ${\displaystyle \mathbb {E} \left[g\left(U\right)\right]=\int _{0}^{1}(1+x)\,\mathrm {d} x={\tfrac {3}{2}}}$ and combine the two into a new estimate

${\displaystyle I\approx {\frac {1}{n}}\sum _{i}f(u_{i})+c\left({\frac {1}{n}}\sum _{i}g(u_{i})-3/2\right).}$

Using ${\displaystyle n=1500}$ realizations and an estimated optimal coefficient ${\displaystyle c^{\star }\approx 0.4773}$ we obtain the following results

 Estimate Variance Classical estimate 0.69475 0.01947 Control variates 0.69295 0.00060

The variance was significantly reduced after using the control variates technique. (The exact result is ${\displaystyle I=\ln 2\approx 0.69314718}$.)