Rare event sampling

Rare event sampling is an umbrella term for a group of computer simulation methods intended to selectively sample 'special' regions of the dynamic space of systems which are unlikely to visit those special regions through brute-force simulation. A familiar example of a rare event in this context would be nucleation of a raindrop from over-saturated water vapour: although raindrops form every day, relative to the length and time scales defined by the motion of water molecules in the vapour phase, the formation of a liquid droplet is extremely rare.

Due to the wide use of computer simulation across very different domains, articles on the topic arise from quite disparate sources and it is difficult to make a coherent survey of rare event sampling techniques.^[1] Contemporary methods include transition-path sampling (TPS),^[2] replica exchange transition interface sampling (RETIS),^[3] repetitive simulation trials after reaching thresholds (RESTART),^[4] forward flux sampling (FFS),^[5]^[6] generalized splitting,^[7]^[8] adaptive multilevel splitting (AMS),^[9] stochastic-process rare-event sampling (SPRES),^[10] line sampling,^[11] subset simulation,^[12] and weighted ensemble (WE).^[13]^[14] The first published rare event technique was by Herman Kahn and Theodore Edward Harris in 1951,^[15] who in turn referred to an unpublished technical report by John von Neumann and Stanislaw Ulam.

Time dependence

If a system is out of thermodynamic equilibrium, then it is possible that there will be time-dependence in the rare event flux. In order to follow the time evolution of the probability of a rare event, it is necessary to maintain a steady current of trajectories into the target region of configurational space. SPRES is specifically designed for this eventuality and AMS is also at least formally valid for applications in which this is required.

In cases where a dissipative steady state obtains (i.e. the conditions for thermodynamic equilibrium are not met, but the rare event flux is nonetheless constant) then FFS and other methods can be appropriate as well as the typically more expensive full-nonequilibrium approaches.

Landscape methods

If the assumption of thermodynamic equilibrium is made, then there is no time-dependence in the rare event flux and a thermodynamic rather than statistical approach to the problem may be more appropriate. These methods are generally thought of separately to rare event methods, but may address the same problems. In these strategies, a free energy landscape (or an energy landscape, for small systems) is prepared. For a small system this landscape may be mapped entirely, while for a system with a larger number of degrees of freedom a projection onto some set of progress coordinates will still be required.

Having mapped the landscape, and making certain assumptions, transition-state theory can then be used to yield a description of the probabilities of paths within it. An example method for mapping landscapes is replica exchange simulation, which has the advantage when applied to rare event problems that piecewise correct trajectory fragments are generated in the course of the method, allowing some direct analysis of the dynamic behaviour even without generating the full landscape.

Related software

R package mistral (CRAN and dev version) for rare event simulation tools
The Python toolset freshs.org as an example toolkit for distributing FFS and SPRES calculations to run sampling trials concurrently on parallel hardware or in a distributed manner across the network.
Pyretis,^[16] an opensource python library to perform TIS (and RETIS) simulations. It is interfaced with common software for MD GROMACS and QM/MD CP2K simulations.
WESTPA and wepy are packages for Weighted Ensemble.
PyVisA, ^[17] An analysis and Visualization software for path sampling outputs with the integration of machine learning based algorithms.

References

^ Morio, J.; Balesdent, M. (2014). "A survey of rare event simulation methods for static input–output models" (PDF). Simulation Modelling Practice and Theory. 49 (4): 287–304. doi:10.1016/j.simpat.2014.10.007.
^ Dellago, Christoph; Bolhuis, Peter G.; Geissler, Phillip L. (2002). Transition Path Sampling. Vol. 123. pp. 1–84. doi:10.1002/0471231509.ch1. ISBN 978-0-471-21453-3. {{cite book}}: |journal= ignored (help)
^ Riccardi, Enrico; Dahlen, Oda; van Erp, Titus S. (2017-09-06). "Fast Decorrelating Monte Carlo Moves for Efficient Path Sampling". The Journal of Physical Chemistry Letters. 8 (18): 4456–4460. doi:10.1021/acs.jpclett.7b01617. hdl:11250/2491276. ISSN 1948-7185. PMID 28857565.
^ Villén-Altamirano, Manuel; Villén-Altamirano, José (1994). "Restart: a straightforward method for fast simulation of rare events". Written at San Diego, CA, USA. Proceedings of the 26th Winter simulation conference. WSC '94. Orlando, Florida, United States: Society for Computer Simulation International. pp. 282–289. ISBN 0-7803-2109-X. acmid 194044.
^ Allen, Rosalind J.; Warren, Patrick B.; ten Wolde, Pieter Rein (2005). "Sampling Rare Switching Events in Biochemical Networks". Physical Review Letters. 94 (1): 018104. arXiv:q-bio/0406006. Bibcode:2005PhRvL..94a8104A. doi:10.1103/PhysRevLett.94.018104. PMID 15698138. S2CID 7998065.
^ Allen, Rosalind J.; ten Wolde, Pieter Rein; Rein Ten Wolde, Pieter (2009). "Forward flux sampling for rare event simulations". Journal of Physics: Condensed Matter. 21 (46): 463102. arXiv:0906.4758. Bibcode:2009JPCM...21T3102A. doi:10.1088/0953-8984/21/46/463102. PMID 21715864. S2CID 10222109.
^ Botev, Z. I.; Kroese, D. P. (2008). "Efficient Monte Carlo simulation via the generalized splitting method". Methodology and Computing in Applied Probability. 10 (4): 471–505. CiteSeerX 10.1.1.399.7912. doi:10.1007/s11009-008-9073-7. S2CID 1147040.
^ Botev, Z. I.; Kroese, D. P. (2012). "Efficient Monte Carlo simulation via the generalized splitting method". Statistics and Computing. 22 (1): 1–16. doi:10.1007/s11222-010-9201-4. S2CID 14970946.
^ Cerou., Frédéric; Arnaud Guyader (2005). Adaptive multilevel splitting for rare event analysis (Technical report). INRIA. RR-5710.
^ Berryman, Joshua T.; Schilling, Tanja (2010). "Sampling rare events in nonequilibrium and nonstationary systems". The Journal of Chemical Physics. 133 (24): 244101. arXiv:1001.2456. Bibcode:2010JChPh.133x4101B. doi:10.1063/1.3525099. PMID 21197970. S2CID 34154184.
^ Schueller, G. I.; Pradlwarter, H. J.; Koutsourelakis, P. (2004). "A critical appraisal of reliability estimation procedures for high dimensions". Probabilistic Engineering Mechanics. 19 (4): 463–474. doi:10.1016/j.probengmech.2004.05.004.
^ Au, S.K.; Beck, James L. (October 2001). "Estimation of small failure probabilities in high dimensions by subset simulation". Probabilistic Engineering Mechanics. 16 (4): 263–277. CiteSeerX 10.1.1.131.1941. doi:10.1016/S0266-8920(01)00019-4.
^ Zuckerman, Daniel M.; Chong, Lillian T. (2017-05-22). "Weighted Ensemble Simulation: Review of Methodology, Applications, and Software". Annual Review of Biophysics. 46 (1): 43–57. doi:10.1146/annurev-biophys-070816-033834. ISSN 1936-122X. PMC 5896317. PMID 28301772.
^ Huber, G.A.; Kim, S. (January 1996). "Weighted-ensemble Brownian dynamics simulations for protein association reactions". Biophysical Journal. 70 (1): 97–110. Bibcode:1996BpJ....70...97H. doi:10.1016/S0006-3495(96)79552-8. PMC 1224912. PMID 8770190.
^ Kahn, H.; Harris, T.E. (1951). "Estimation of particle transmission by random sampling". National Bureau of Standards Appl. Math. Series. 12: 27–30.
^ Riccardi, Enrico; Anders, Lervik; van Erp, Titus S. (2020). "PyRETIS 2: An improbability drive for rare events". Journal of Computational Chemistry. 41 (4): 379–377. doi:10.1002/jcc.26112. PMID 31742744.
^ Aarøen, Ola; Kiær, Henrik; Riccardi, Enrico (2020). "PyVisA: Visualization and Analysis of path sampling trajectories". Journal of Computational Chemistry. 42 (6): 435–446. doi:10.1002/jcc.26467. PMID 33314210. S2CID 229179978.

[1] Morio, J.; Balesdent, M. (2014). "A survey of rare event simulation methods for static input–output models" (PDF). Simulation Modelling Practice and Theory. 49 (4): 287–304. doi:10.1016/j.simpat.2014.10.007.

[acj2002-2] Dellago, Christoph; Bolhuis, Peter G.; Geissler, Phillip L. (2002). Transition Path Sampling. Vol. 123. pp. 1–84. doi:10.1002/0471231509.ch1. ISBN 978-0-471-21453-3. {{cite book}}: |journal= ignored (help)

[:0-3] Riccardi, Enrico; Dahlen, Oda; van Erp, Titus S. (2017-09-06). "Fast Decorrelating Monte Carlo Moves for Efficient Path Sampling". The Journal of Physical Chemistry Letters. 8 (18): 4456–4460. doi:10.1021/acs.jpclett.7b01617. hdl:11250/2491276. ISSN 1948-7185. PMID 28857565.

[wsc1994-4] Villén-Altamirano, Manuel; Villén-Altamirano, José (1994). "Restart: a straightforward method for fast simulation of rare events". Written at San Diego, CA, USA. Proceedings of the 26th Winter simulation conference. WSC '94. Orlando, Florida, United States: Society for Computer Simulation International. pp. 282–289. ISBN 0-7803-2109-X. acmid 194044.

[5] Allen, Rosalind J.; Warren, Patrick B.; ten Wolde, Pieter Rein (2005). "Sampling Rare Switching Events in Biochemical Networks". Physical Review Letters. 94 (1): 018104. arXiv:q-bio/0406006. Bibcode:2005PhRvL..94a8104A. doi:10.1103/PhysRevLett.94.018104. PMID 15698138. S2CID 7998065.

[6] Allen, Rosalind J.; ten Wolde, Pieter Rein; Rein Ten Wolde, Pieter (2009). "Forward flux sampling for rare event simulations". Journal of Physics: Condensed Matter. 21 (46): 463102. arXiv:0906.4758. Bibcode:2009JPCM...21T3102A. doi:10.1088/0953-8984/21/46/463102. PMID 21715864. S2CID 10222109.

[7] Botev, Z. I.; Kroese, D. P. (2008). "Efficient Monte Carlo simulation via the generalized splitting method". Methodology and Computing in Applied Probability. 10 (4): 471–505. CiteSeerX 10.1.1.399.7912. doi:10.1007/s11009-008-9073-7. S2CID 1147040.

[8] Botev, Z. I.; Kroese, D. P. (2012). "Efficient Monte Carlo simulation via the generalized splitting method". Statistics and Computing. 22 (1): 1–16. doi:10.1007/s11222-010-9201-4. S2CID 14970946.

[9] Cerou., Frédéric; Arnaud Guyader (2005). Adaptive multilevel splitting for rare event analysis (Technical report). INRIA. RR-5710.

[10] Berryman, Joshua T.; Schilling, Tanja (2010). "Sampling rare events in nonequilibrium and nonstationary systems". The Journal of Chemical Physics. 133 (24): 244101. arXiv:1001.2456. Bibcode:2010JChPh.133x4101B. doi:10.1063/1.3525099. PMID 21197970. S2CID 34154184.

[Schueller-11] Schueller, G. I.; Pradlwarter, H. J.; Koutsourelakis, P. (2004). "A critical appraisal of reliability estimation procedures for high dimensions". Probabilistic Engineering Mechanics. 19 (4): 463–474. doi:10.1016/j.probengmech.2004.05.004.

[12] Au, S.K.; Beck, James L. (October 2001). "Estimation of small failure probabilities in high dimensions by subset simulation". Probabilistic Engineering Mechanics. 16 (4): 263–277. CiteSeerX 10.1.1.131.1941. doi:10.1016/S0266-8920(01)00019-4.

[13] Zuckerman, Daniel M.; Chong, Lillian T. (2017-05-22). "Weighted Ensemble Simulation: Review of Methodology, Applications, and Software". Annual Review of Biophysics. 46 (1): 43–57. doi:10.1146/annurev-biophys-070816-033834. ISSN 1936-122X. PMC 5896317. PMID 28301772.

[14] Huber, G.A.; Kim, S. (January 1996). "Weighted-ensemble Brownian dynamics simulations for protein association reactions". Biophysical Journal. 70 (1): 97–110. Bibcode:1996BpJ....70...97H. doi:10.1016/S0006-3495(96)79552-8. PMC 1224912. PMID 8770190.

[15] Kahn, H.; Harris, T.E. (1951). "Estimation of particle transmission by random sampling". National Bureau of Standards Appl. Math. Series. 12: 27–30.

[16] Riccardi, Enrico; Anders, Lervik; van Erp, Titus S. (2020). "PyRETIS 2: An improbability drive for rare events". Journal of Computational Chemistry. 41 (4): 379–377. doi:10.1002/jcc.26112. PMID 31742744.

[17] Aarøen, Ola; Kiær, Henrik; Riccardi, Enrico (2020). "PyVisA: Visualization and Analysis of path sampling trajectories". Journal of Computational Chemistry. 42 (6): 435–446. doi:10.1002/jcc.26467. PMID 33314210. S2CID 229179978.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

Time dependence

Landscape methods

See also

Related software

References