= Dual total correlation =

In information theory, dual total correlation, information rate, excess entropy, or binding information is one of several known non-negative generalizations of mutual information. While total correlation is bounded by the sum entropies of the n elements, the dual total correlation is bounded by the joint-entropy of the n elements. Although well behaved, dual total correlation has received much less attention than the total correlation. A measure known as "TSE-complexity" defines a continuum between the total correlation and dual total correlation.

==Definition==

For a set of n random variables $\{X_1,\ldots, X_n\}$, the dual total correlation $D(X_1,\ldots, X_n)$ is given by

$D(X_1,\ldots, X_n) = H\left( X_1, \ldots, X_n \right) - \sum_{i=1}^n H\left( X_i \mid X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n \right) ,$

where $H(X_{1},\ldots, X_{n})$ is the joint entropy of the variable set $\{X_{1},\ldots, X_{n}\}$ and $H(X_i \mid \cdots )$ is the conditional entropy of variable $X_{i}$, given the rest.

==Normalized==
The dual total correlation normalized between [0,1] is simply the dual total correlation divided by its maximum value $H(X_{1}, \ldots, X_{n})$,

$ND(X_1,\ldots, X_n) = \frac{D(X_1,\ldots, X_n)}{H(X_1,\ldots, X_n)} .$

==Relationship with Total Correlation==
Dual total correlation is non-negative and bounded above by the joint entropy $H(X_1, \ldots, X_n)$.

$0 \leq D(X_1, \ldots, X_n) \leq H(X_1, \ldots, X_n) .$

Secondly, Dual total correlation has a close relationship with total correlation, $C(X_1, \ldots, X_n)$, and can be written in terms of differences between the total correlation of the whole, and all subsets of size $N-1$:

$D(\textbf{X}) = (N-1)C(\textbf{X}) - \sum_{i=1}^{N} C(\textbf{X}^{-i})$

where $\textbf{X} = \{X_1,\ldots, X_n\}$ and $\textbf{X}^{-i} = \{X_1,\ldots, X_{i-1}, X_{i+1},\ldots, X_n\}$

Furthermore, the total correlation and dual total correlation are related by the following bounds:

$\frac{C(X_1, \ldots, X_n)}{n-1} \leq D(X_1, \ldots, X_n) \leq (n-1) \; C(X_1, \ldots, X_n) .$

Finally, the difference between the total correlation and the dual total correlation defines a novel measure of higher-order information-sharing: the O-information:

$\Omega(\textbf{X}) = C(\textbf{X}) - D(\textbf{X})$.

The O-information (first introduced as the "enigmatic information" by James and Crutchfield is a signed measure that quantifies the extent to which the information in a multivariate random variable is dominated by synergistic interactions (in which case $\Omega(\textbf{X})<0$) or redundant interactions (in which case $\Omega(\textbf{X}) > 0$, and have found multiple applications in neuroscience.

==History==
Han (1978) originally defined the dual total correlation as,
 $\begin{align}
& D(X_1,\ldots, X_n) \\[10pt]
\equiv {} & \left[ \sum_{i=1}^n H(X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n ) \right] - (n-1) \; H(X_1, \ldots, X_n) \; .
\end{align}$
However Abdallah and Plumbley (2010) showed its equivalence to the easier-to-understand form of the joint entropy minus the sum of conditional entropies via the following:

 $\begin{align}
& D(X_1,\ldots, X_n) \\[10pt]
\equiv {} & \left[ \sum_{i=1}^n H(X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n ) \right] - (n-1) \; H(X_1, \ldots, X_n) \\
= {} & \left[ \sum_{i=1}^n H(X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n ) \right] + (1-n) \; H(X_1, \ldots, X_n) \\
= {} & H(X_1, \ldots, X_n) + \left[ \sum_{i=1}^n H(X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n ) - H(X_1, \ldots, X_n) \right] \\
= {} & H\left( X_1, \ldots, X_n \right) - \sum_{i=1}^n H\left( X_i \mid X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n \right)\; .
\end{align}$
