# Multicanonical ensemble

In statistics and physics, multicanonical ensemble (also called multicanonical sampling or flat histogram) is a sampling technique that uses the Metropolis–Hastings algorithm to compute integrals where the integrand has a rough landscape with multiple local minima. It samples states according to the inverse of the density of states,[1] which has to be known a priori or be computed using other techniques like the Wang and Landau algorithm.[2] Multicanonical sampling is an important technique for spin systems like the Ising model or spin glasses.[1][3][4]

## Motivation

In systems with a large number of degrees of freedom, like spin systems, Monte Carlo integration is required. In this integration, importance sampling and in particular the Metropolis algorithm, is a very important technique.[3] However, the Metropolis algorithm samples states according to $\exp(-\beta E)$ where beta is the inverse of the temperature. This means that an energy barrier of $\Delta E$ on the energy spectrum is exponentially difficult to overcome.[1] Systems with multiple local energy minima like the Potts model become hard to sample as the algorithm gets stuck in the system's local minima.[3] This motivates other approaches, namely, other sampling distributions.

## Overview

Multicanonical ensemble uses the Metropolis–Hastings algorithm with a sampling distribution given by the inverse of the density of states of the system, contrary to the sampling distribution $\exp(-\beta E)$ of the Metropolis algorithm.[1] With this choice, on average, the number of states sampled at each energy is constant, i.e. it is a simulation with a "flat histogram" on energy. This leads to an algorithm for which the energy barriers are no longer difficult to overcome. Another advantage over the Metropolis algorithm is that the sampling is independent of the temperature of the system, which means that one simulation allows the estimation of thermodynamical variables for all temperatures (thus the name "multicanonical": several temperatures). This is a great improvement in the study of first order phase transitions.[1]

The biggest problem in performing a multicanonical ensemble is that the density of states has to be known a priori.[2][3] One important contribution to multicanonical sampling was the Wang and Landau algorithm, which asymptotically converges to a multicanonical ensemble while calculating the density of states during the convergence.[2]

The multicanonical ensemble is not restricted to physical systems. It can be employed on abstract systems which have a cost function F. By using the density of states with respect to F, the method becomes general for computing higher-dimensional integrals or finding local minima.[5]

## Motivation

Consider a system and it phase-space $\Omega$ characterized by a configuration $\boldsymbol{r}\in \Omega$ and a "cost" function F from the system's phase-space to a one-dimensional space $\Gamma$: $F(\Omega) = \Gamma = [\Gamma_\min, \Gamma_\max]$, the spectrum of F.

 example: The Ising model with N sites is an example of such a system; the phase-space is a discrete phase-space defined by all possible configurations of N spins $\boldsymbol{r} = (\sigma_1, \ldots, \sigma_i, \ldots, \sigma_N)$ where $\sigma_i\in\{-1,1\}$. The cost function is the Hamiltonian of the system: $H(\boldsymbol{r}) = - \sum_{\langle i~j\rangle} J_{ij} (1 - \sigma_i \sigma_j),$ where $$ is the sum over neighborhoods and $J_{ij}$ is the interaction matrix. The energy spectrum is $\Gamma = [E_\min, E_\max]$ which, in this case, depends on the particular $J_{ij}$ used. If all $J_{ij}$ are 1 (the ferromagnetic Ising model), $E_\min = 0$ (e.g. all spins are 1.) and $E_\max = 2 D N$ (half spins are up, half spins are down). Also notice that in this system, $\Gamma \in \mathbb{Z}$

The computation of an average quantity $\langle Q\rangle$ over the phase-space requires the evaluation of an integral:

$\langle Q\rangle = \frac{1}{V}\int_\Omega Q(\boldsymbol{r}) p(\boldsymbol{r}) \,d\boldsymbol{r}$

where $p(\boldsymbol{r})$ is weight of each state per volume,

$p(\boldsymbol{r}) = \frac{1}{V}\int_\Omega p(\boldsymbol{r}) \,d\boldsymbol{r}.$

The density of states in respect with F is given by

$\rho(f) = \frac{1}{V}\int_\Omega \delta(f - F(\boldsymbol{r})) \,d\boldsymbol{r}$

which means that if both Q and p do not depend on the particular state but only on the particular F's value of the state $F(\boldsymbol{r}) = F_\boldsymbol{r}$,

$Q(\boldsymbol{r}) = Q(F_\boldsymbol{r}), p(\boldsymbol{r}) = p(F_\boldsymbol{r}),$

the formula for $\langle Q\rangle$ can be integrated over f by adding a dirac delta function,

\begin{align} \langle Q\rangle & = \int_{\Gamma_\min}^{\Gamma_\max} \int_\Omega Q(F_\boldsymbol{r}) p(F_\boldsymbol{r}) \delta(f - F_\boldsymbol{r}) \,d\boldsymbol{r} \,df \\ & = \int_{\Gamma_\min}^{\Gamma_\max} Q(f) p(f) \int_\Omega \delta(f - F_\boldsymbol{r}) \,d\boldsymbol{r} \, df \\ & = \int_{\Gamma_\min}^{\Gamma_\max} Q(f) p(f) \rho(f) \, df \\ \end{align}

i.e. the knowledge of the density of states allows the computation of averages over F using a one-dimensional integral instead of a multidimensional integral as it is the projection of the number of states on $\Omega$ in $\Gamma$.

 example: A system in contact with a heat bath at inverse temperature $\beta$ is a clear example for computing this kind of integral. For instance, the mean energy of the system is weighted by the Boltzmann factor: $\langle E\rangle = \frac{1}{V}\int_\Omega H(\boldsymbol{r}) \frac{e^{-\beta H(\boldsymbol{r})}}{Z} \, d\boldsymbol{r}$ where $Z = \frac{1}{V}\int_\Omega e^{-\beta H(\boldsymbol{r})} \, d\boldsymbol{r}.$ The density of states $\rho(E)$ can used to compute $\langle E\rangle$: $\langle E\rangle = \int_{E_\min}^{E_\max} E \frac{e^{-\beta E}}{Z} \rho(E) \, dE$

Because the number of states can be high for systems with high number of degrees of freedom, an analytical expression can be hard to obtain and the computation of $\langle Q\rangle$ can be expensive. Typically, because the problem is a multidimensional integral, Monte Carlo integration is normally employed. On the simplest formulation, the method chooses N uniform random states $\boldsymbol{r}_i\in \Omega$, and uses the estimator

$\overline{Q}_N = \frac{1}{N}\sum_{i = 0}^N Q(\boldsymbol{r}_i) p(\boldsymbol{r}_i)$

for computing $\langle Q\rangle$: $\overline{Q}_N$ converges almost surely to $\langle Q\rangle$ by the strong law of large numbers:

$\lim_{N\rightarrow\infty} \overline{Q}_N = \langle Q\rangle.$

One typical problem of this convergence is that the variance of Q can be very high, which leads to a high computational effort to achieve reasonable results.

 example On the previous example, the states that mostly contribute to the integral are the ones with low energy. If the states are sampled uniformly, on average, the number of states which are sampled with energy E is given by the density of states. This density of states can be centered far away from the energy's minima and thus the average can be difficult to obtain.

To improve this convergence, the Metropolis–Hastings algorithm was proposed. Generally, Monte Carlo methods' idea is to use importance sampling to improve the convergence of the estimator $\overline{Q}_N$ by sampling states according to an arbitrary distribution $P(\boldsymbol{r})$ (notice the capital P, different from p), and use the appropriate estimator:

$\overline{Q}_N = \frac{1}{X}\sum_{i = 0}^N Q(\boldsymbol{r}_i) P^{-1}(\boldsymbol{r}_i) p(\boldsymbol{r}_i)$

where $X = \sum_{i = 0}^N P^{-1}(\boldsymbol{r}_i)$.

Notice that when P is a uniform distribution, this estimator equals the one used on a uniform sampling, as it should.

One important choice of P is to define an arbitrary temperature, and use it equals to Boltzmann factor with the energy being the cost function:

$P(\boldsymbol{r}) = p_\mathrm{Boltzmann}(\boldsymbol{r}) = \frac{e^{-\beta F(\boldsymbol{r})}}{\int_\Omega \, d\boldsymbol{r} e^{-\beta F(\boldsymbol{r})}}$

I.e. the lower the cost function of a particular state, the more likely it is to be sampled.

Historically, this occurred because the original idea[6] was exactly to use Metropolis–Hastings algorithm to compute averages on a system in contact with a heat bath where the weight is given by the Boltzmann factor. On these systems, the choice of states according to it led to a considerable improvement on studying physical systems.[3]

However, it is not true that the sampling distribution must equals the weight distribution. One reason for this is that Metropolis–Hastings algorithm fails to converge when the cost function has multiple minima.[1] The reason is that the algorithm uses a random walk with local steps. I.e. the random walk normally performs steps whose energy difference is of order 1. This means that the computational cost to the algorithm leave a specific region with a local minimum exponentially increases with the cost function's value of the minimum. I.e. the deeper the minimum, the more time the algorithm spends there, and harder (exponentially with deep) it will leave.

This is the motivation to introduce a multicanonic ensemble. The idea is to avoid becoming stuck on local minima of the cost function by making them "invisible" to the sampling technique.

## Multicanonic ensemble

A multicanonic ensemble is choosing the sampling distribution used in the importance sampling to be

$P(\boldsymbol{r}) = \frac{1}{\rho(F(\boldsymbol{r}))}$

where

$\rho(f) = \frac{1}{V}\int_\Omega \delta(F(\boldsymbol{r}) - f) \, d\boldsymbol{r}$

is the density of states of the system with respect to the cost function (V is the phase-space volume). The consequence of this choice is that the probability to sample a state and it has cost function f is

$P(f) = \frac{1}{f_\max - f_\min}\int_\Omega \delta(f - F(\boldsymbol{r})) P(\boldsymbol{r})\,d\boldsymbol{r} = \frac{1}{f_\max - f_\min}\frac{1}{V}\rho(f) \int_\Omega \delta(f - F(\boldsymbol{r})) \, d\boldsymbol{r} = \frac{1}{f_\max - f_\min} = \text{constant}$

which motivates the name "flat histogram". I.e. all costs are equally sampled, and thus there are no barriers. For systems in contact with a heat bath, there is another important advantage: because the sampling is independent of the temperature, one simulation is enough to study all temperatures (thus the name "multicanonic": several temperatures).

 example: On the ferromagnetic Ising model with N sites (exemplified on previous section), the density of states can be analytically computed. In this case, a multicanonic ensemble can be used to compute any other quantity Q by sampling the system according to $P(\boldsymbol{r})$ and using the proper estimator $\overline{Q}$ defined on the previous section.

## Tunneling time and critical slowing down

Like in any other Monte Carlo method, there are correlations of the samples being drawn from $P(\boldsymbol{r})$. A typical measurement of the correlation is the tunneling time. The tunneling time is defined by the number of Markov steps (of the Markov chain) the simulation needs to perform a round-trip between the minimum and maximum of the spectrum of F. One motivation to use the tunneling time is that when it crosses the spectra, it passes through the region of the maximum of the density of states, thus de-correlating the process. On the other hand using round-trips ensures that the system visits all the spectrum.

Because the histogram is flat on the variable F, a multicanonic ensemble can be seen as a diffusion process (i.e. a random walk) on the one-dimensional line of F values. Detailed balance of the process dictates that there is no drift on the process.[7] This implies that the tunneling time, in local dynamics, should scale as a diffusion process, and thus the tunneling time should scale quadratically with the size of the spectrum, N:

$\tau_{tt} \propto N^2$

However, in some systems (the Ising model being the most paradigmatic), the scaling suffers from critical slowing down: it is $N^{2+z}$ where $z>0$ depends on the particular system.[4]

Non-local dynamics were developed to improve the scaling to a quadratic scaling[8] (see wolff algorithm), beating the critical slowing down. However, it is still an open question whether there is a local dynamics that does not suffer from critical slowing down in spin systems like Ising model.

## References

1. Berg, B.; Neuhaus, T. (1992). "Multicanonical ensemble: A new approach to simulate first-order phase transitions". Physical Review Letters 68 (1): 9–12. arXiv:hep-lat/9202004. Bibcode:1992PhRvL..68....9B. doi:10.1103/PhysRevLett.68.9. PMID 10045099. edit
2. ^ a b c Wang, F.; Landau, D. (2001). "Efficient, Multiple-Range Random Walk Algorithm to Calculate the Density of States". Physical Review Letters 86 (10): 2050–2053. arXiv:cond-mat/0011174. Bibcode:2001PhRvL..86.2050W. doi:10.1103/PhysRevLett.86.2050. PMID 11289852. edit
3. Newmann, M E J; Barkema, G T (2002). Monte Carlo Methods in Statistical Physics. USA: Oxford University Press. ISBN 0198517971.
4. ^ a b Dayal, P.; Trebst, S.; Wessel, S.; Würtz, D.; Troyer, M.; Sabhapandit, S.; Coppersmith, S. (2004). "Performance Limitations of Flat-Histogram Methods". Physical Review Letters 92 (9). arXiv:cond-mat/0306108. Bibcode:2004PhRvL..92i7201D. doi:10.1103/PhysRevLett.92.097201. edit
5. ^ Lee, J.; Choi, M. (1994). "Optimization by multicanonical annealing and the traveling salesman problem". Physical Review E 50 (2): R651. Bibcode:1994PhRvE..50..651L. doi:10.1103/PhysRevE.50.R651. edit
6. ^ Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. (1953). "Equation of State Calculations by Fast Computing Machines". The Journal of Chemical Physics 21 (6): 1087. Bibcode:1953JChPh..21.1087M. doi:10.1063/1.1699114. edit
7. ^ Robert, Christian; Casella, George (2004). Monte Carlo statistical methods. Springer. ISBN 978-0-387-21239-5.
8. ^ Wolff, U. (1989). "Collective Monte Carlo Updating for Spin Systems". Physical Review Letters 62 (4): 361–364. Bibcode:1989PhRvL..62..361W. doi:10.1103/PhysRevLett.62.361. PMID 10040213. edit