Rare Event Sampling
Rare event sampling is an umbrella term for a group of computer simulation methods intended to selectively sample 'special' regions of the dynamic space of systems which are unlikely to visit those special regions through brute-force simulation. A familiar example of a rare event in this context would be nucleation of a raindrop from over-saturated water vapour: although raindrops form every day, relative to the length and time scales defined by the motion of water molecules in the vapour phase, the formation of a liquid droplet is extremely rare.
Due to the wide use of computer simulation across very different fields of endeavour, articles on the topic arise from quite disparate sources and it is difficult to make a coherent survey of rare event sampling techniques. Contemporary methods include Transition Path Sampling (TPS), Repetitive Simulation Trials After Reaching Thresholds (RESTART), Forward Flux Sampling (FFS), Generalized Splitting, Adaptive Multilevel Splitting (AMS), Stochastic Process Rare Event Sampling (SPRES), Line sampling and Subset simulation. The first published rare event technique was by Herman Kahn and Theodore Edward Harris in 1951, who in turn referred to an unpublished technical report by John von Neumann and Stanislaw Ulam.
Generation of Trajectory Fragments
To generate simulation trajectories it is typically necessary to find some way of altering an existing configuration or trajectory so as to explore new regions of the configurational space.
A common theme in RESTART, FFS, AMS and SPRES (at least) is the idea of 'splitting' (or 'enrichment'), in which trajectories of a stochastic system are made to diverge by changing the seed of the random number generator. Selectively splitting trajectories allows the simulation to over-sample regions of its dynamical space which are judged in some sense interesting. In order to limit computational cost, trajectories which are judged relatively less interesting (less close to manifesting the target rare event) must be therefore be killed (or 'pruned'). The differences between these splitting algorithms then manifest in the choice of pruning and enrichment approach.
While it is often reasonable to introduce randomness to a system which would otherwise be considered deterministic (such as by coupling a microscopic system to a fluctuating heat bath, or by perturbing the velocities in a mesoscopic system after collisions) splitting methods in general sit most comfortably with the study of systems which have a large natural stochastic element in their dynamics. The contrast to splitting methods is provided by TPS, in which 'shooting' is instead typically employed: for a strictly deterministic system, the new path must be created by making a small change to the initial conditions rather than by branching from a point mid-way into the simulation. If the dynamics of a system are chaotic then shooting moves based on perturbation of the start coordinates should have a very high rate of failure in generating rare events, however a benefit is that shooting can function equally well (if time-symmetry is present) by making a perturbation to the final conditions of the system (post rare-event) and running the simulation backwards, thereby sampling new paths with the guarantee that they terminate in the rare event. If the assumptions of equilibrium holding only at the initial (or final) conditions are relaxed, then shooting moves can also be made by perturbing the system during a shot.
Choice of Progress Coordinate
In addition to a splitting or shooting procedure to generate new trajectories, most rare event methods (certainly RESTART, FFS, AMS and SPRES) require a 'progress coordinate' to be defined. This is a scalar or vector-valued function which indicates the closeness of an instantaneous state of the simulation to manifesting the target rare event. Definition of a suitable progress coordinate can prove non-trivial, and must in many cases be carried out with an understanding of the weaknesses of the particular algorithm. An example of a situation in which a non-trivial progress coordinate might be required is for protein folding: a simple root-mean-square deviation might prove to be non-monotonic in the true reaction coordinate, leading potentially to trapping of the calculation, depending on the algorithm used.
A second potential pitfall in the choice of progress coordinate which must be considered in relation to the algorithm, is the computational expense associated with an evaluation of the progress coordinate. FFS and AMS formally require evaluation of the progress coordinate at every timestep of the dynamics, which can in theory be more expensive than the dynamics itself for systems which have short-range interactions but whose emergent order is defined with reference to some long-range (therefore many-particle) correlation.
If a system is out of thermodynamic equilibrium, then it is possible that there will be time-dependence in the rare event flux. In order to follow the time evolution of the probability of a rare event it is necessary to maintain a steady current of trajectories into the target region of configurational space. SPRES is specifically designed for this eventuality and AMS is also at least formally valid for applications in which this is required.
In cases where a dissipative steady state obtains (i.e. the conditions for thermodynamic equilibrium are not met, but the rare event flux is nonetheless constant) then FFS and other methods can be appropriate as well as the typically more expensive full-nonequilibrium approaches.
If the assumption of thermodynamic equilibrium is made, then there is no time-dependence in the rare event flux and a thermodynamic rather than statistical approach to the problem may be more appropriate. These methods are generally thought of separately to rare event methods, but may address the same problems. In these strategies, a free energy landscape (or an energy landscape, for small systems) is prepared. For a small system this landscape may be mapped entirely, while for a system with a larger number of degrees of freedom a projection onto some set of progress coordinates will still be required.
Having mapped the landscape, and making certain assumptions, Transition state theory can then be used yield a description of the probabilities of paths within it. An example method for mapping landscapes is Replica exchange simulation, which has the advantage when applied to rare event problems that piecewise correct trajectory fragments are generated in the course of the method, allowing some direct analysis of the dynamic behaviour even without generating the full landscape.
- Rare events
- R package mistral for rare event simulation tools
- The Python toolset freshs.org as an example toolkit for distributing FFS and SPRES calculations to run sampling trials concurrently.
- Morio, J.; Balesdent, M. (2014). "A survey of rare event simulation methods for static input–output models". Simulation Modelling Practice and Theory. 49 (4): 287–304. doi:10.1016/j.simpat.2014.10.007.
- Dellago, Christoph; Bolhuis, Peter G.; Geissler, Phillip L. (2002). "Transition Path Sampling". Advances in Chemical Physics. 123 (1): 1–84. doi:10.1002/0471231509.ch1. ISBN 0-471-21453-1.
- Villén-Altamirano, Manuel; Villén-Altamirano, José (1994). "Restart: a straightforward method for fast simulation of rare events". Written at San Diego, CA, USA. Proceedings of the 26th Winter simulation conference. WSC '94. Orlando, Florida, United States: Society for Computer Simulation International. pp. 282–289. ISBN 0-7803-2109-X. acmid 194044.
- Allen, Rosalind J.; ten Wolde, Pieter Rein; Rein Ten Wolde, Pieter (2009). "Forward flux sampling for rare event simulations". Journal of Physics: Condensed Matter. 21 (46): 463102. doi:10.1088/0953-8984/21/46/463102.
- Botev, Z. I.; Kroese, D. P. (2008). "Efficient Monte Carlo simulation via the generalized splitting method". Methodology and Computing in Applied Probability. 10 (4): 1–16. doi:10.1007/s11009-008-9073-7.
- Botev, Z. I.; Kroese, D. P. (2012). "Efficient Monte Carlo simulation via the generalized splitting method". Statistics and Computing. 22 (1): 1–16. doi:10.1007/s11222-010-9201-4.
- Cerou., Frédéric; Arnaud Guyader (2005). Adaptive multilevel splitting for rare event analysis (Technical report). INRIA. RR-5710.
- Berryman, Joshua T.; Schilling, Tanja (2010). "Sampling rare events in nonequilibrium and nonstationary systems". The Journal of Chemical Physics. 133 (24): 244101. doi:10.1063/1.3525099. PMID 21197970.
- Schueller, G. I.; Pradlwarter, H. J.; Koutsourelakis, P. (2004). "A critical appraisal of reliability estimation procedures for high dimensions". Probabilistic Engineering Mechanics. 19 (4): 463–474. doi:10.1016/j.probengmech.2004.05.004.
- Au, S.K.; Beck, James L. (October 2001). "Estimation of small failure probabilities in high dimensions by subset simulation". Probabilistic Engineering Mechanics. 16 (4): 263–277. doi:10.1016/S0266-8920(01)00019-4.
- Kahn, H.; Harris, T.E. (1951). "Estimation of particle transmission by random sampling". National Bureau of Standards Appl. Math. Series. 12: 27–30.