SAMPL Challenge

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

SAMPL (Statistical Assessment of the Modeling of Proteins and Ligands) is a set of community-wide blind challenges aimed to advance computational techniques as standard predictive tools in rational drug design.[1][2][3][4][5] A broad range of biologically relevant systems with different sizes and levels of complexities including proteins, host-guest complexes, and drug-like small molecules have been selected to test the latest modeling methods and force fields in SAMPL. New experimental data, such as binding affinity and hydration free energy, are withheld from participants until the prediction submission deadline, so that the true predictive power of methods can be revealed. The most recent SAMPL5 challenge contains two prediction categories: the binding affinity of host-guest systems, and the distribution coefficients of drug-like molecules between water and cyclohexane.[6][7] Since 2008, the SAMPL challenge series has attracted widespread interest from scientists engaged in the field of computer-aided drug design (CADD) around the world, and has resulted in around 100 publications with many of them highly cited.[8][9][10] The current SAMPL organizers include Prof. John Chodera at Memorial Sloan Kettering Cancer Center, Prof. Michael K. Gilson at University of California, San Diego, Prof. David Mobley at University of California, Irvine, and Prof. Michael Shirts, at University of Colorado, Boulder.[11]

Project significance[edit]

The SAMPL challenge seeks to accelerate progress in developing quantitative, accurate drug discovery tools by providing prospective validation and rigorous comparisons for computational methodologies and force fields. Computer-aided drug design methods have been considerably improved over time, along with the rapid growth of high-performance computing capabilities. However, their applicability in the pharmaceutical industry are still highly limited, due to the insufficient accuracy. Lacking large-scale prospective validations, methods tend to suffer from over-fitting the pre-existing experimental data. To overcome this, SAMPL challenges have been organized as blind tests: each time new datasets are carefully designed and collected from academic or industrial research laboratories, and measurements are released shortly after the deadline of prediction submission. Researchers then can compare those high-quality, prospective experimental data with the submitted estimates.

SAMPL has historically focused on the properties of host-guest systems and drug-like small molecules. These simply model systems require considerably less computational resources to simulate, compared to the protein systems, and thus enable much faster convergence. Meanwhile, through careful design, these model systems can be used to focus on one particular or a subset of simulation challenges.[12] The past several SAMPL host-guest, hydration free energy and log D challenges revealed the limitations in generalized force fields,[13][14] facilitated the development of solvent models,[15][16] and highlighted the importance of properly handling protonation states and salt effects.[17][18]


Registration and participation is free for SAMPL challenges. The most recent SAMPL challenge, SAMPL6, required online registration on the Drug Design Data Resource (D3R) website.[19] Instructions, input files and results were then provided through the same website. Participants were allowed to submit multiple predictions through the D3R website, either anonymously or with research affiliation. Since the SAMPL2 challenge, all participants have been invited to attend the SAMPL workshops and submit manuscripts to describe their results. After a peer-review process, the resulting papers, along with the overview papers which summarize all submitting data, were published in the special issues of the Journal of Computer-Aided Molecular Design.[20]


While SAMPL serves a clear community need, its future is uncertain in that it remains an unfunded initiative. Currently, funding is being sought from the NIH to allow the design of future SAMPL challenges to drive advances in the areas they are most needed for modeling efforts.[9][10] If grant funding is not forthcoming, perhaps it will become possible for an industrial partnership to help sustain and extend SAMPL. While it may be able to continue in its current form slightly longer, it has become increasingly clear that some resources are needed in order to ensure its continued existence and success.


Earlier SAMPL challenges[edit]

The first SAMPL exercise, SAMPL0 (2008)[21] focused on the predictions of solvation free energies of 17 small molecules. A research group at Stanford University and scientists at OpenEye Scientific Software carried out the calculations. Despite the informal format, SAMPL0 laid the groundwork for the following SAMPL challenges.

SAMPL1 (2009)[22] and SAMPL2 challenges (2010)[1] were organized by OpenEye and continued to focus on predicting solvation free energies of drug-like small molecules. Attempts were also made to predict binding affinities, binding poses and tautomer ratios. Both challenges attracted significant participations from computational scientists and researchers in academia and industry.

SAMPL3 and SAMPL4[edit]

The blinded data sets for host-guest binding affinities were introduced for the first time in SAMPL3 (2011-2012),[3] along with solvation free energies for small molecules and the binding affinity data for 500 fragment-like tyrosine inhibitors. Three host molecules were all from the cucurbituril family. The SAMPL3 challenge received 103 submissions from 23 research groups worldwide.[2]

Different from the prior three SAMPL events, the SAMPL4 exercise (2013-2014)[4][5] was coordinated by academic researchers, with logistical support from OpenEye. Datasets in SAMPL4 consisted of binding affinities for host-guest systems and HIV integrase inhibitors, as well as hydration free energies of small molecules. Host molecules included cucurbit[7]uril (CB7) and octa-acid. The SAMPL4 hydration challenge involved 49 submissions from 19 groups. The participation of the host–guest challenge also grew significantly compared to SAMPL3. The workshop was held at Stanford University in September, 2013.


The protein-ligand challenges were separated from SAMPL in SAMPL5 (2015-2016)[6][7] and were distributed as the new Grand Challenges of the Drug Design Data Resource (D3R).[23] SAMPL5 allowed participants to make predictions of the binding affinities of three sets of host-guest systems: an acyclic CB7 derivative and two host from the octa-acid family. Participants were also encouraged to submit predictions for binding enthalpies. A wide array of computational methods were tested, including density functional theory (DFT), molecular dynamics, docking, and metadynamics. The distribution coefficient predictions were introduced for the first time, receiving total of 76 submissions from 18 researcher groups or scientists for a set of 53 small molecules. The workshop was held in March, 2016 at University of California, San Diego as part of the D3R workshop. The top-performing methods in the host-guest challenge yielded encouraging yet imperfect correlations with experimental data, accompanied by large, systematic shifts relative to experiment.[24][25]


The SAMPL6 testing systems include cucurbit[8]uril, octa-acid, tetra-endo-methyl octa-acid, and a series of fragment-like small molecules. The host-guest, conformational sampling and pKa prediction challenges of SAMPL6 are now closed. The SAMPL6 workshop was jointly run with the D3R workshop on Feb. 22 and 23, 2018, in Scripps Institution of Oceanography, La Jolla, CA (

See also[edit]


  1. ^ a b Geballe, Matthew T.; Skillman, A. Geoffrey; Nicholls, Anthony; Guthrie, J. Peter; Taylor, Peter J. (2010-05-09). "The SAMPL2 blind prediction challenge: introduction and overview". Journal of Computer-Aided Molecular Design. 24 (4): 259–279. doi:10.1007/s10822-010-9350-8. ISSN 0920-654X.
  2. ^ a b Skillman, A. Geoffrey (2012-05-24). "SAMPL3: blinded prediction of host–guest binding affinities, hydration free energies, and trypsin inhibitors". Journal of Computer-Aided Molecular Design. 26 (5): 473–474. doi:10.1007/s10822-012-9580-z. ISSN 0920-654X.
  3. ^ a b Muddana, Hari S.; Varnado, C. Daniel; Bielawski, Christopher W.; Urbach, Adam R.; Isaacs, Lyle; Geballe, Matthew T.; Gilson, Michael K. (2012-02-25). "Blind prediction of host–guest binding affinities: a new SAMPL3 challenge". Journal of Computer-Aided Molecular Design. 26 (5): 475–487. doi:10.1007/s10822-012-9554-1. ISSN 0920-654X. PMC 3383923. PMID 22366955.
  4. ^ a b Muddana, Hari S.; Fenley, Andrew T.; Mobley, David L.; Gilson, Michael K. (2014-03-06). "The SAMPL4 host–guest blind prediction challenge: an overview". Journal of Computer-Aided Molecular Design. 28 (4): 305–317. doi:10.1007/s10822-014-9735-1. ISSN 0920-654X. PMC 4053502. PMID 24599514.
  5. ^ a b Mobley, David L.; Wymer, Karisa L.; Lim, Nathan M.; Guthrie, J. Peter (2014-03-11). "Blind prediction of solvation free energies from the SAMPL4 challenge". Journal of Computer-Aided Molecular Design. 28 (3): 135–150. doi:10.1007/s10822-014-9718-2. ISSN 0920-654X. PMC 4006301. PMID 24615156.
  6. ^ a b Yin, Jian; Henriksen, Niel M.; Slochower, David R.; Shirts, Michael R.; Chiu, Michael W.; Mobley, David L.; Gilson, Michael K. (2016-09-22). "Overview of the SAMPL5 host–guest challenge: Are we doing better?". Journal of Computer-Aided Molecular Design: 1–19. doi:10.1007/s10822-016-9974-4. ISSN 0920-654X.
  7. ^ a b Bannan, Caitlin C.; Burley, Kalistyn H.; Chiu, Michael; Shirts, Michael R.; Gilson, Michael K.; Mobley, David L. (2016-09-27). "Blind prediction of cyclohexane–water distribution coefficients from the SAMPL5 challenge". Journal of Computer-Aided Molecular Design. 30 (11): 927–944. doi:10.1007/s10822-016-9954-8. ISSN 0920-654X. PMC 5209301. PMID 27677750.
  8. ^ L, Mobley, David; D, Chodera, John; K, Gilson, Michael (2017-06-21). "Results of the 2017 Roadmap survey of the Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) challenge community". eScholarship.
  9. ^ a b "Advancing predictive modeling through focused development of model systems to drive new modeling innovations". Zenodo. doi:10.5281/zenodo.163963.
  10. ^ a b Mobley, David L. (2016-10-05). "Advancing predictive modeling through focused development of model systems to drive new modeling innovations". eScholarship.
  11. ^ "D3R | SAMPL".
  12. ^ Mobley, David L.; Gilson, Michael K. (2016-12-08). "Predicting binding free energies: Frontiers and benchmarks". bioRxiv 074625.
  13. ^ Muddana, Hari S.; Gilson, Michael K. (2012-01-25). "Prediction of SAMPL3 host–guest binding affinities: evaluating the accuracy of generalized force-fields". Journal of Computer-Aided Molecular Design. 26 (5): 517–525. doi:10.1007/s10822-012-9544-3. ISSN 0920-654X. PMC 3383906. PMID 22274835.
  14. ^ Mobley, David L.; Liu, Shaui; Cerutti, David S.; Swope, William C.; Rice, Julia E. (2011-12-24). "Alchemical prediction of hydration free energies for SAMPL". Journal of Computer-Aided Molecular Design. 26 (5): 551–562. doi:10.1007/s10822-011-9528-8. ISSN 0920-654X. PMC 3583515. PMID 22198475.
  15. ^ Pal, Rajat Kumar; Haider, Kamran; Kaur, Divya; Flynn, William; Xia, Junchao; Levy, Ronald M.; Taran, Tetiana; Wickstrom, Lauren; Kurtzman, Tom; Gallicchio, Emilio (2016-09-30). "A combined treatment of hydration and dynamical effects for the modeling of host–guest binding thermodynamics: the SAMPL5 blinded challenge". Journal of Computer-Aided Molecular Design: 1–16. doi:10.1007/s10822-016-9956-6. ISSN 0920-654X.
  16. ^ Brini, Emiliano; Paranahewage, S. Shanaka; Fennell, Christopher J.; Dill, Ken A. (2016-09-08). "Adapting the semi-explicit assembly solvation model for estimating water-cyclohexane partitioning with the SAMPL5 molecules". Journal of Computer-Aided Molecular Design. 30 (11): 1067–1077. doi:10.1007/s10822-016-9961-9. ISSN 0920-654X.
  17. ^ Sure, Rebecca; Antony, Jens; Grimme, Stefan (2014-03-27). "Blind Prediction of Binding Affinities for Charged Supramolecular Host–Guest Systems: Achievements and Shortcomings of DFT-D3". The Journal of Physical Chemistry B. 118 (12): 3431–3440. doi:10.1021/jp411616b. ISSN 1520-6106.
  18. ^ Klamt, Andreas; Eckert, Frank; Reinisch, Jens; Wichmann, Karin (2016-07-26). "Prediction of cyclohexane-water distribution coefficients with COSMO-RS on the SAMPL5 data set". Journal of Computer-Aided Molecular Design. 30 (11): 959–967. doi:10.1007/s10822-016-9927-y. ISSN 0920-654X.
  19. ^ "D3R | Challenges". Retrieved 2017-01-12.
  20. ^ "Journal of Computer-Aided Molecular Design - All Volumes & Issues - Springer". Retrieved 2017-01-12.
  21. ^ Nicholls, Anthony; Mobley, David L.; Guthrie, J. Peter; Chodera, John D.; Bayly, Christopher I.; Cooper, Matthew D.; Pande, Vijay S. (2008-02-01). "Predicting Small-Molecule Solvation Free Energies: An Informal Blind Test for Computational Chemistry". Journal of Medicinal Chemistry. 51 (4): 769–779. doi:10.1021/jm070549+. ISSN 0022-2623.
  22. ^ Guthrie, J. Peter (2009-04-09). "A Blind Challenge for Computational Solvation Free Energies: Introduction and Overview". The Journal of Physical Chemistry B. 113 (14): 4501–4507. doi:10.1021/jp806724u. ISSN 1520-6106.
  23. ^ Gathiaka, Symon; Liu, Shuai; Chiu, Michael; Yang, Huanwang; Stuckey, Jeanne A.; Kang, You Na; Delproposto, Jim; Kubish, Ginger; Dunbar, James B. (2016-09-30). "D3R grand challenge 2015: Evaluation of protein–ligand pose and affinity predictions". Journal of Computer-Aided Molecular Design. 30 (9): 651–668. doi:10.1007/s10822-016-9946-8. ISSN 0920-654X.
  24. ^ Yin, Jian; Henriksen, Niel M.; Slochower, David R.; Gilson, Michael K. (2016-09-16). "The SAMPL5 host–guest challenge: computing binding free energies and enthalpies from explicit solvent simulations by the attach-pull-release (APR) method". Journal of Computer-Aided Molecular Design: 1–13. doi:10.1007/s10822-016-9970-8. ISSN 0920-654X.
  25. ^ Bosisio, Stefano; Mey, Antonia S. J. S.; Michel, Julien (2016-08-08). "Blinded predictions of host-guest standard free energies of binding in the SAMPL5 challenge". Journal of Computer-Aided Molecular Design: 1–10. doi:10.1007/s10822-016-9933-0. ISSN 0920-654X.

External links[edit]