This article may be too technical for most readers to understand.(December 2021)
Parallel tempering in physics and statistics, is a computer simulation method typically used to find the lowest free energy state of a system of many interacting particles at low temperature. That is, the one expected to be observed in reality. It addresses the problem that at high temperature one may have a stable state different from low temperature, whereas simulations at low temperature may become "stuck" in a metastable state. It does this by using the fact that the high temperature simulation may visit states typical of both stable and metastable low temperature states.
More specifically, parallel tempering (also known as replica exchange MCMC sampling), is a simulation method aimed at improving the dynamic properties of Monte Carlo method simulations of physical systems, and of Markov chain Monte Carlo (MCMC) sampling methods more generally. The replica exchange method was originally devised by Swendsen and Wang  then extended by Geyer and later developed, among others, by Hukushima and Nemoto, Giorgio Parisi, Sugita and Okamoto formulated a molecular dynamics version of parallel tempering: this is usually known as replica-exchange molecular dynamics or REMD.
Essentially, one runs N copies of the system, randomly initialized, at different temperatures. Then, based on the Metropolis criterion one exchanges configurations at different temperatures. The idea of this method is to make configurations at high temperatures available to the simulations at low temperatures and vice versa. This results in a very robust ensemble which is able to sample both low and high energy configurations. In this way, thermodynamical properties such as the specific heat, which is in general not well computed in the canonical ensemble, can be computed with great precision.
Typically a Monte Carlo simulation using a Metropolis–Hastings update consists of a single stochastic process that evaluates the energy of the system and accepts/rejects updates based on the temperature T. At high temperatures updates that change the energy of the system are comparatively more probable. When the system is highly correlated, updates are rejected and the simulation is said to suffer from critical slowing down.
If we were to run two simulations at temperatures separated by a ΔT, we would find that if ΔT is small enough, then the energy histograms obtained by collecting the values of the energies over a set of Monte Carlo steps N will create two distributions that will somewhat overlap. The overlap can be defined by the area of the histograms that falls over the same interval of energy values, normalized by the total number of samples. For ΔT = 0 the overlap should approach 1.
Another way to interpret this overlap is to say that system configurations sampled at temperature T1 are likely to appear during a simulation at T2. Because the Markov chain should have no memory of its past, we can create a new update for the system composed of the two systems at T1 and T2. At a given Monte Carlo step we can update the global system by swapping the configuration of the two systems, or alternatively trading the two temperatures. The update is accepted according to the Metropolis–Hastings criterion with probability
and otherwise the update is rejected. The detailed balance condition has to be satisfied by ensuring that the reverse update has to be equally likely, all else being equal. This can be ensured by appropriately choosing regular Monte Carlo updates or parallel tempering updates with probabilities that are independent of the configurations of the two systems or of the Monte Carlo step.
This update can be generalized to more than two systems.
By a careful choice of temperatures and number of systems one can achieve an improvement in the mixing properties of a set of Monte Carlo simulations that exceeds the extra computational cost of running parallel simulations.
Other considerations to be made: increasing the number of different temperatures can have a detrimental effect, as one can think of the 'lateral' movement of a given system across temperatures as a diffusion process. Set up is important as there must be a practical histogram overlap to achieve a reasonable probability of lateral moves.
The parallel tempering method can be used as a super simulated annealing that does not need restart, since a system at high temperature can feed new local optimizers to a system at low temperature, allowing tunneling between metastable states and improving convergence to a global optimum.
- Swendsen RH and Wang JS (1986) Replica Monte Carlo simulation of spin glasses Physical Review Letters 57 : 2607–2609
- C. J. Geyer, (1991) in Computing Science and Statistics, Proceedings of the 23rd Symposium on the Interface, American Statistical Association, New York, p. 156.
- Hukushima, Koji & Nemoto, Koji (1996). "Exchange Monte Carlo method and application to spin glass simulations". J. Phys. Soc. Jpn. 65 (6): 1604–1608. arXiv:cond-mat/9512035. Bibcode:1996JPSJ...65.1604H. doi:10.1143/JPSJ.65.1604. S2CID 15032087.
- Marco Falcioni & Michael W. Deem (1999). "A Biased Monte Carlo Scheme for Zeolite Structure Solution". J. Chem. Phys. 110 (3): 1754. arXiv:cond-mat/9809085. Bibcode:1999JChPh.110.1754F. doi:10.1063/1.477812. S2CID 13963102.
- David J. Earl and Michael W. Deem (2005) "Parallel tempering: Theory, applications, and new perspectives", Phys. Chem. Chem. Phys., 7, 3910
- Y. Sugita & Y. Okamoto (1999). "Replica-exchange molecular dynamics method for protein folding". Chemical Physics Letters. 314 (1–2): 141–151. Bibcode:1999CPL...314..141S. doi:10.1016/S0009-2614(99)01123-9.
- Radford M. Neal (1996). "Sampling from multimodal distributions using tempered transitions". Statistics and Computing. 6 (4): 353–366. doi:10.1007/BF00143556. S2CID 11106113.