# Variable elimination

Variable elimination (VE) is a simple and general exact inference algorithm in probabilistic graphical models, such as Bayesian networks and Markov random fields.[1] It can be used for inference of maximum a posteriori (MAP) state or estimation of marginal distribution over a subset of variables. The algorithm has exponential time complexity, but could be efficient in practice for the low-treewidth graphs, if the proper elimination order is used.

## Factors

Enabling a key reduction in algorithmic complexity, a factor ${\displaystyle f}$, also known as a potential, of variables ${\displaystyle V}$ is a relation between each instantiation of ${\displaystyle v}$ of variables ${\displaystyle f}$ to a non-negative number, commonly denoted as ${\displaystyle f(x)}$.[2] A factor does not necessarily have a set interpretation. One may perform operations on factors of different representations such as a probability distribution or conditional distribution.[2] Joint distributions often become too large to handle as the complexity of this operation is exponential. Thus variable elimination becomes more feasible when computing factorized entities.

## Basic Operations

### Variable Summation

Algorithm 1, called sum-out (SO), or marginalization, eliminates a single variable ${\displaystyle v}$ from a set ${\displaystyle \phi }$ of factors,[3] and returns the resulting set of factors. The algorithm collect-relevant simply returns those factors in ${\displaystyle \phi }$ involving variable ${\displaystyle v}$.

Algorithm 1 sum-out(${\displaystyle v}$,${\displaystyle \phi }$)

${\displaystyle \Phi }$ = collect factors relevant to ${\displaystyle v}$
${\displaystyle \Psi }$ = the product of all factors in ${\displaystyle \Phi }$
${\displaystyle \tau =\sum _{v}\Psi }$

return ${\displaystyle (\phi -\Psi )\cup \{\tau \}}$

Example

Here we have a joint probability distribution. A variable, ${\displaystyle v}$ can be summed out between a set of instantiations where the set ${\displaystyle V-v}$ at minimum must agree over the remaining variables. The value of ${\displaystyle v}$ is irrelevant when it is the variable to be summed out. [2]

${\displaystyle V_{1}}$ ${\displaystyle V_{2}}$ ${\displaystyle V_{3}}$ ${\displaystyle V_{4}}$ ${\displaystyle V_{5}}$ ${\displaystyle Pr(.)}$
true true true false false 0.80
false true true false false 0.20

After eliminating ${\displaystyle V_{1}}$, its reference is excluded and we are left with a distribution only over the remaining variables and the sum of each instantiation.

${\displaystyle V_{2}}$ ${\displaystyle V_{3}}$ ${\displaystyle V_{4}}$ ${\displaystyle V_{5}}$ ${\displaystyle Pr(.)}$
true true false false 1.0

The resulting distribution which follows the sum-out operation only helps to answer queries that do not mention ${\displaystyle V_{1}}$.[2] Also worthy to note, the summing-out operation is commutative.

### Factor Multiplication

Computing a product between multiple factors results in a factor compatible with a single instantiation in each factor.[2]

Algorithm 2 mult-factors(${\displaystyle v}$,${\displaystyle \phi }$)[2]

${\displaystyle Z}$ = Union of all variables between product of factors ${\displaystyle f_{1}(X_{1}),...,f_{m}(X_{m})}$
${\displaystyle f}$ = a factor over ${\displaystyle f}$ where ${\displaystyle f}$ for all ${\displaystyle f}$
For each instantiation ${\displaystyle z}$
For 1 to ${\displaystyle m}$
${\displaystyle x_{1}=}$ instantiation of variables ${\displaystyle X_{1}}$ consistent with ${\displaystyle z}$
${\displaystyle f(z)=f(z)f_{i}(x_{i})}$
return ${\displaystyle f}$

Factor multiplication is not only commutative but also associative.

## Inference

The most common query type is in the form ${\displaystyle p(X|E=e)}$ where ${\displaystyle X}$ and ${\displaystyle E}$ are disjoint subsets of ${\displaystyle U}$, and ${\displaystyle E}$ is observed taking value ${\displaystyle e}$. A basic algorithm to computing p(X|E = e) is called variable elimination (VE), first put forth in.[1]

Taken from,[1] this algorithm computes ${\displaystyle p(X|E=e)}$ from a discrete Bayesian network B. VE calls SO to eliminate variables one by one. More specifically, in Algorithm 2, ${\displaystyle \phi }$ is the set C of CPTs for B, ${\displaystyle X}$ is a list of query variables, ${\displaystyle E}$ is a list of observed variables, ${\displaystyle e}$ is the corresponding list of observed values, and ${\displaystyle \sigma }$ is an elimination ordering for variables ${\displaystyle U-XE}$, where ${\displaystyle XE}$ denotes ${\displaystyle X\cup E}$.

Variable Elimination Algorithm VE(${\displaystyle \phi ,X,E,e,\sigma }$)

Multiply factors with appropriate CPTs While σ is not empty
Remove the first variable ${\displaystyle v}$ from ${\displaystyle \sigma }$
${\displaystyle \phi }$ = sum-out${\displaystyle (v,\phi )}$
${\displaystyle p(X,E=e)}$ = the product of all factors ${\displaystyle \Psi \in \phi }$

return ${\displaystyle p(X,E=e)/\sum _{X}p(X,E=e)}$

## Ordering

Finding the optimal order in which to eliminate variables is an NP-hard problem. As such there are heuristics one may follow to better optimize performance by order:

1. Minimum Degree: Eliminate the variable which results in constructing the smallest factor possible.[2]
2. Minimum Fill: By constructing an undirected graph showing variable relations expressed by all CPTs, eliminate the variable which would result in the least edges to be added post elimination.[2]

## References

1. ^ a b c Zhang, N.L., Poole, D.:A Simple Approach to Bayesian Network Computations.In: 7th Canadian Conference on Artificial Intelligence,pp. 171--178. Springer, New York (1994)
2. Darwiche, Adnan (2009-01-01). Modeling and Reasoning with Bayesian Networks. doi:10.1017/cbo9780511811357. ISBN 9780511811357.
3. ^ Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, MA (2009)