Scoring functions for docking
| Docking glossary |
|---|
| • Receptor or host or lock – The "receiving" molecule, most commonly a protein or other biopolymer. |
| • Ligand or guest or key – The complementary partner molecule which binds to the receptor. Ligands are most often small molecules but could also be another biopolymer. |
| • Docking – Computational simulation of a candidate ligand binding to a receptor. |
| • Binding mode – The orientation of the ligand relative to the receptor as well as the conformation of the ligand and receptor when bound to each other. |
| • Pose – A candidate binding mode. |
| • Scoring – The process of evaluating a particular pose by counting the number of favorable intermolecular interactions such as hydrogen bonds and hydrophobic contacts. |
| • Ranking – The process of classifying which ligands are most likely to interact favorably to a particular receptor based on the predicted free-energy of binding. |
| edit |
In the fields of computational chemistry and molecular modelling, scoring functions are fast approximate mathematical methods used to predict the strength of the non-covalent interaction (also referred to as binding affinity) between two molecules after they have been docked. Most commonly one of the molecules is a small organic compound such as a drug and the second is the drug's biological target such as a protein receptor.[1] Scoring functions have also been developed to predict the strength of other types of intermolecular interactions, for example between two proteins[2] or between protein and DNA.[3]
Contents |
[edit] Utility
Scoring functions are widely used in drug discovery and other molecular modelling applications. These include:[4]
- Virtual screening of small molecule databases of candidate ligands to identify novel small molecules that bind to a protein target of interest and therefore are useful starting points for drug discovery[5]
- De novo design (design "from scratch") of novel small molecules that bind to a protein target[6]
- Lead optimization of screening hits to optimize their affinity and selectivity[7]
A potentially more reliable but much more computationally demanding alternative to scoring functions are free energy perturbation calculations.[8]
[edit] Prerequisites
Scoring functions are normally parameterized (or trained) against a data set consisting of experimentally determined binding affinities between molecular species similar to the species that one wishes to predict.
For currently used methods aiming to predict affinities of ligands for proteins the following must first be known or predicted:
- Protein tertiary structure – arrangement of the protein atoms in three dimensional space. Protein structures may be determined by experimental techniques such as X-ray crystallography or solution phase NMR methods or predicted by homology modelling.
- Ligand active conformation – three dimensional shape of the ligand when bound to the protein
- Binding-mode – orientation of the two binding partners relative to each other in the complex
The above information yields the three dimensional structure of the complex. Based on this structure, the scoring function can then estimate the strength of the association between the two molecules in the complex using one of the methods outlined below. Finally the scoring function itself may be used to help predict both the binding mode and the active conformation of the small molecule in the complex, or alternatively a simpler and computationally faster function may be utilised within the docking run.
[edit] Classes
There are three general classes of scoring functions:
- Force field – affinities are estimated by summing the strength of intermolecular van der Waals and electrostatic interactions between all atoms of the two molecules in the complex. The intramolecular energies (also referred to as strain energy) of the two binding partners are also frequently included. Finally since the binding normally takes place in the presence of water, the desolvation energies of the ligand and of the protein are sometimes taken into account using implicit solvation methods such as GBSA or PBSA.
- Empirical – based on counting the number of various types of interactions between the two binding partners.[6] Counting may be based on the number of ligand and receptor atoms in contact with each other or by calculating the change in solvent accessible surface area (ΔSASA) in the complex compared to the uncomplexed ligand and protein. The coefficients of the scoring function are usually fit using multiple linear regression methods. These interactions terms of the function may include for example:
- hydrophobic — hydrophobic contacts (favorable),
- hydrophobic — hydrophilic contacts (unfavorable),
- hydrophilic — hydrophilic contacts (no contribution to affinity except for the following special cases):
- number of hydrogen bonds (favorable electrostatic contribution to affinity, especially if shielded from solvent, if solvent exposed no contribution),
- number of hydrogen bond "mismatches" or other types of electrostatic repulsion (very unfavorable and rarely seen in stable complexes),
- number of rotatable bonds immobilized in complex formation (unfavorable entropic contribution).
- Knowledge-based – based on statistical observations of intermolecular close contacts in large 3D databases (such as the Cambridge Structural Database or Protein Data Bank) which are used to derive "potentials of mean force". This method is founded on the assumption that close intermolecular interactions between certain types of atoms or functional groups that occur more frequently than one would expect by a random distribution are likely to be energetically favorable and therefore contribute favorably to binding affinity.[9]
Finally hybrid scoring functions have also been developed in which the components from two or more of the above scoring functions are combined into one function.
[edit] Evaluation
A 2009 paper suggested that, since different scoring functions are relatively co-linear, consensus scoring functions may not improve accuracy significantly.[10] This claim went somewhat against the prevailing view in the field, since previous studies had suggested that consensus scoring was beneficial.[11]
[edit] References
- ^ Jain AN (2006). "Scoring functions for protein-ligand docking". Curr. Protein Pept. Sci. 7 (5): 407–20. doi:10.2174/138920306778559395. PMID 17073693.
- ^ Lensink MF, Méndez R, Wodak SJ (2007). "Docking and scoring protein complexes: CAPRI 3rd Edition". Proteins Structure Function and Bioinformatics 69 (4): 704. doi:10.1002/prot.21804. PMID 17918726.
- ^ Robertson TA, Varani G (2007). "An all-atom, distance-dependent scoring function for the prediction of protein-DNA interactions from structure". Proteins 66 (2): 359–74. doi:10.1002/prot.21162. PMID 17078093.
- ^ Rajamani R, Good AC (2007). "Ranking poses in structure-based lead discovery and optimization: current trends in scoring function development". Current opinion in drug discovery & development 10 (3): 308–15. PMID 17554857.
- ^ Seifert MH, Kraus J, Kramer B (2007). "Virtual high-throughput screening of molecular databases". Current opinion in drug discovery & development 10 (3): 298–307. PMID 17554856.
- ^ a b Böhm HJ (July 1998). "Prediction of binding constants of protein ligands: a fast method for the prioritization of hits obtained from de novo design or 3D database search programs". J. Comput. Aided Mol. Des. 12 (4): 309–23. doi:10.1023/A:1007999920146. PMID 9777490.
- ^ Joseph-McCarthy D, Baber JC, Feyfant E, Thompson DC, Humblet C (2007). "Lead optimization via high-throughput molecular docking". Current opinion in drug discovery & development 10 (3): 264–74. PMID 17554852.
- ^ Foloppe N, Hubbard R (2006). "Towards predictive ligand design with free-energy based computational methods?". Curr. Med. Chem. 13 (29): 3583–608. doi:10.2174/092986706779026165. PMID 17168725.
- ^ Muegge I (2006). "PMF scoring revisited". J. Med. Chem. 49 (20): 5895–902. doi:10.1021/jm050038s. PMID 17004705.
- ^ Englebienne P, Moitessier N (2009). "Docking Ligands into Flexible and Solvated Macromolecules. 4. Are Popular Scoring Functions Accurate for this Class of Proteins?". J Chem Inf Model 49 (6): 1568–1580. doi:10.1021/ci8004308. PMID 19445499. http://pubs.acs.org/doi/abs/10.1021/ci8004308.
- ^ Oda A, Tsuchida K, Takakura T, Yamaotsu N, Hirono S (2006). "Comparison of consensus scoring strategies for evaluating computational models of protein-ligand complexes". J Chem Inf Model 46 (1): 380–391. doi:10.1021/ci050283k. PMID 16426072. http://pubs.acs.org/doi/abs/10.1021/ci050283k.