Structure validation: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎Software and Websites: * Resolution by Proxy, ResProx - protein model resolution-by-proxy (via EdwardLinks)
m Added 12 dois to journal cites using AWB (10102)
Line 9: Line 9:


==Historical summary==
==Historical summary==
Macromolecular crystallography was preceded by the older field of small-molecule [[x-ray crystallography]] (for structures with less than a few hundred atoms). Small-molecule [[diffraction]] data extends to much higher [[Resolution (electron density)|resolution]] than feasible for macromolecules, and has a very clean mathematical relationship between the data and the atomic model. The residual, or R-factor, measures the agreement between the experimental data and the values back-calculated from the atomic model. For a well-determined small-molecule structure the R-factor is nearly as small as the uncertainty in the experimental data (well under 5%). Therefore that one test by itself provides most of the validation needed, but a number of additional consistency and methodology checks are done by automated software<ref>{{Cite journal |author=Spek AL |year=2003 |title=Single-crystal structure validation with the program PLATON |journal=J. Applied Crystallography |volume= 36 |pages=7–13 |doi=10.1107/S0021889802022112}}</ref> as a requirement for small-molecule crystal structure papers submitted to the [[International Union of Crystallography]] (IUCr) journals such as [[Acta Crystallographica]] section B or C. Atomic coordinates of these small-molecule structures are archived and accessed through the [[Cambridge Structural Database]] (CSD)<ref>{{Cite journal |author=Allen FH |year=2002 |title=The Cambridge Structural Database: a quarter of a million crystal structures and rising |journal=Acta Crystallographica |volume= B 58 |pages=380–388 |doi=10.1107/S0108768102003890}}</ref> or the [[Crystallography Open Database]] (COD).<ref>{{cite journal |author=Grazulis S, Chateigner D, Downs RT, Yokochi AT, Quiros M, Lutterotti L, Manakova E, Butkus J, Moeck, P, Le Bail A |year=2009 |title=Crystallography Open Database – an open-access collection of crystal structures |journal= Journal of Applied Crystallography |volume= 42 |pages=726–729}}</ref>
Macromolecular crystallography was preceded by the older field of small-molecule [[x-ray crystallography]] (for structures with less than a few hundred atoms). Small-molecule [[diffraction]] data extends to much higher [[Resolution (electron density)|resolution]] than feasible for macromolecules, and has a very clean mathematical relationship between the data and the atomic model. The residual, or R-factor, measures the agreement between the experimental data and the values back-calculated from the atomic model. For a well-determined small-molecule structure the R-factor is nearly as small as the uncertainty in the experimental data (well under 5%). Therefore that one test by itself provides most of the validation needed, but a number of additional consistency and methodology checks are done by automated software<ref>{{Cite journal |author=Spek AL |year=2003 |title=Single-crystal structure validation with the program PLATON |journal=J. Applied Crystallography |volume= 36 |pages=7–13 |doi=10.1107/S0021889802022112}}</ref> as a requirement for small-molecule crystal structure papers submitted to the [[International Union of Crystallography]] (IUCr) journals such as [[Acta Crystallographica]] section B or C. Atomic coordinates of these small-molecule structures are archived and accessed through the [[Cambridge Structural Database]] (CSD)<ref>{{Cite journal |author=Allen FH |year=2002 |title=The Cambridge Structural Database: a quarter of a million crystal structures and rising |journal=Acta Crystallographica |volume= B 58 |pages=380–388 |doi=10.1107/S0108768102003890}}</ref> or the [[Crystallography Open Database]] (COD).<ref>{{cite journal |author=Grazulis S, Chateigner D, Downs RT, Yokochi AT, Quiros M, Lutterotti L, Manakova E, Butkus J, Moeck, P, Le Bail A |year=2009 |title=Crystallography Open Database – an open-access collection of crystal structures |journal= Journal of Applied Crystallography |volume= 42 |pages=726–729 |doi=10.1107/s0021889809016690}}</ref>


The first macromolecular validation software was developed around 1990, for proteins. It included Rfree [[cross-validation (statistics)|cross-validation]] for model-to-data match,<ref name=Rfree>{{cite journal |author=[[Axel T. Brunger|Brunger AT]] |year=1992 |title=Free R value: a novel statistical quantity for assessing the accuracy of crystal structures |journal=Nature (London) |volume=355 |pages=472–475}}</ref> bond length and angle parameters for covalent geometry,<ref name=Engh>{{cite journal |authors=Engh RA, Huber R |year=1991 |title=Accurate bond and angle parameters for X-ray protein structure refinement |journal=Acta Crystallographica |volume=A 47 |pages=392&ndash;400}}</ref> and sidechain and backbone conformational criteria.<ref name_Ponder&Richards>{{cite journal |author=Ponder JW, Richards FM |year=1987 |title=Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes |journal=J. Molecular Biology |volume=193 |pages=775–791}}</ref><ref name=procheck>{{cite journal |author=Laskowski RA, MacArthur MW, Moss DS, [[Janet Thornton|Thornton JM]] |year=1993 |title=PROCHECK: a program to check the stereochemical quality of protein structures |journal=J. Applied Crystallography |volume=26 |pages=283–291}}</ref><ref name=whatif>{{cite journal |author=Hooft RWW, Vriend G, Sander C, Abola EE |year=1996 |title=Errors in protein structures |journal=Nature (London) |volume=381 |pages=272}}</ref> For macromolecular structures, the atomic models are deposited in the [[Protein Data Bank]] (PDB), still the single archive of this data. The PDB was established in the 1970s at [[Brookhaven National Laboratory]],<ref>{{Cite journal |author=Bernstein FC, Koetzle TF, Williams GJB, Meyer EF Jr, Brice MD, Rodgers JR, [[Olga Kennard|Kennard 0]], Shimanouchi T, Tasumi M |year=1977 |title=The Protein Data Bank: A computer-based archival file for macromolecular structures |journal=J. Molecular Biology |volume=112 |pages=535–542}}</ref> moved in 2000 to the [http://www.rcsb.org/pdb|RCSB] (Research Collaboration for Structural Biology) centered at [[Rutgers]],<ref>{{Cite journal |author=[[Helen M. Berman|Berman HM]], Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, [[Philip Bourne|Bourne PE]] |year=2000 |title=The Protein Data Bank |journal=Nucleic Acids Research |volume=28 |pages=235–242 |doi=10.1093/nar/28.1.235 |pmid=10592235 |pmc=102472}}</ref> and expanded in 2003 to become the [http://www.wwpdb.org/ wwPDB] (worldwide Protein Data Bank),<ref name=wwPDB>{{cite journal |author=[[Helen M. Berman|Berman HM]], Henrick K, Nakamura H |year=2003 |title=Announcing the worldwide Protein Data Bank |journal=Nature Structural & Molecular Biology |volume=10 |pages=980 |doi=10.1038/nsb1203-980 |pmid=14634627}}</ref> with access sites in Europe ([http://pdbe.org|PDBe]) and Asia ([http://www.pdbj.org|PDBj]), and with NMR data handled at the [http://www.bmrb.wisc.edu BioMagResBank (BMRB)] in Wisconsin.
The first macromolecular validation software was developed around 1990, for proteins. It included Rfree [[cross-validation (statistics)|cross-validation]] for model-to-data match,<ref name=Rfree>{{cite journal |author=[[Axel T. Brunger|Brunger AT]] |year=1992 |title=Free R value: a novel statistical quantity for assessing the accuracy of crystal structures |journal=Nature (London) |volume=355 |pages=472–475 |doi=10.1038/355472a0}}</ref> bond length and angle parameters for covalent geometry,<ref name=Engh>{{cite journal |authors=Engh RA, Huber R |year=1991 |title=Accurate bond and angle parameters for X-ray protein structure refinement |journal=Acta Crystallographica |volume=A 47 |pages=392&ndash;400}}</ref> and sidechain and backbone conformational criteria.<ref name_Ponder&Richards>{{cite journal |author=Ponder JW, Richards FM |year=1987 |title=Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes |journal=J. Molecular Biology |volume=193 |pages=775–791}}</ref><ref name=procheck>{{cite journal |author=Laskowski RA, MacArthur MW, Moss DS, [[Janet Thornton|Thornton JM]] |year=1993 |title=PROCHECK: a program to check the stereochemical quality of protein structures |journal=J. Applied Crystallography |volume=26 |pages=283–291 |doi=10.1107/s0021889892009944}}</ref><ref name=whatif>{{cite journal |author=Hooft RWW, Vriend G, Sander C, Abola EE |year=1996 |title=Errors in protein structures |journal=Nature (London) |volume=381 |pages=272 |doi=10.1038/381272a0}}</ref> For macromolecular structures, the atomic models are deposited in the [[Protein Data Bank]] (PDB), still the single archive of this data. The PDB was established in the 1970s at [[Brookhaven National Laboratory]],<ref>{{Cite journal |author=Bernstein FC, Koetzle TF, Williams GJB, Meyer EF Jr, Brice MD, Rodgers JR, [[Olga Kennard|Kennard 0]], Shimanouchi T, Tasumi M |year=1977 |title=The Protein Data Bank: A computer-based archival file for macromolecular structures |journal=J. Molecular Biology |volume=112 |pages=535–542 |doi=10.1016/s0022-2836(77)80200-3}}</ref> moved in 2000 to the [http://www.rcsb.org/pdb|RCSB] (Research Collaboration for Structural Biology) centered at [[Rutgers]],<ref>{{Cite journal |author=[[Helen M. Berman|Berman HM]], Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, [[Philip Bourne|Bourne PE]] |year=2000 |title=The Protein Data Bank |journal=Nucleic Acids Research |volume=28 |pages=235–242 |doi=10.1093/nar/28.1.235 |pmid=10592235 |pmc=102472}}</ref> and expanded in 2003 to become the [http://www.wwpdb.org/ wwPDB] (worldwide Protein Data Bank),<ref name=wwPDB>{{cite journal |author=[[Helen M. Berman|Berman HM]], Henrick K, Nakamura H |year=2003 |title=Announcing the worldwide Protein Data Bank |journal=Nature Structural & Molecular Biology |volume=10 |pages=980 |doi=10.1038/nsb1203-980 |pmid=14634627}}</ref> with access sites in Europe ([http://pdbe.org|PDBe]) and Asia ([http://www.pdbj.org|PDBj]), and with NMR data handled at the [http://www.bmrb.wisc.edu BioMagResBank (BMRB)] in Wisconsin.
Validation rapidly became standard in the field,<ref name=Kleywegt2000>{{cite journal |author= Kleywegt GJ |year= 2000 |title= Validation of protein crystal structures |journal=Acta Crystallogr |volume=D 56 |pages=18–19}}</ref> with further developments described below. *Obviously needs expansion*
Validation rapidly became standard in the field,<ref name=Kleywegt2000>{{cite journal |author= Kleywegt GJ |year= 2000 |title= Validation of protein crystal structures |journal=Acta Crystallogr |volume=D 56 |pages=18–19}}</ref> with further developments described below. *Obviously needs expansion*
Line 44: Line 44:


====Geometry====
====Geometry====
<ref name=Engh/><ref name=Gelbin>{{cite journal |author=Gelbin A, Schneider B, Clowney L, Hsieh S-H, Olson WK, [[Helen M. Berman|Berman HM]] |year=1996 |title=Geometric parameters in Nucleic Acids:Sugar and Phosphate Constituents |journal=J Amer Chem Soc |volume=118 |pages=519–529}}</ref><ref>{{cite journal |author=Schultze P, Feigon J |year=1997 |title=Chirality errors in nucleic acid structures |journal=Nature (London) |volume=387 |pages=668}}</ref>
<ref name=Engh/><ref name=Gelbin>{{cite journal |author=Gelbin A, Schneider B, Clowney L, Hsieh S-H, Olson WK, [[Helen M. Berman|Berman HM]] |year=1996 |title=Geometric parameters in Nucleic Acids:Sugar and Phosphate Constituents |journal=J Amer Chem Soc |volume=118 |pages=519–529 |doi=10.1021/ja9528846}}</ref><ref>{{cite journal |author=Schultze P, Feigon J |year=1997 |title=Chirality errors in nucleic acid structures |journal=Nature (London) |volume=387 |pages=668}}</ref>


====Conformation (Dihedrals): Protein & RNA====
====Conformation (Dihedrals): Protein & RNA====
Line 62: Line 62:
* [http://molprobity.biochem.duke.edu/ MolProbity web service]
* [http://molprobity.biochem.duke.edu/ MolProbity web service]
* [http://swift.cmbi.ru.nl/gv/pdbreport/index.html Protein structure validation database [[PDBREPORT]].]
* [http://swift.cmbi.ru.nl/gv/pdbreport/index.html Protein structure validation database [[PDBREPORT]].]
* [http://eds.bmc.uu.se/eds/ EDS (Electron Density Server)<ref>{{Cite journal |author=Kleywegt G.J., Harris MR, Zou JY, Taylor TC, Wahlby A, Jones TA |year=2004 |title=The Uppsala Electron-Density Server |journal=Acta Crystallographica D |volume= 60 |pages=2240–2249}}</ref>]
* [http://eds.bmc.uu.se/eds/ EDS (Electron Density Server)<ref>{{Cite journal |author=Kleywegt G.J., Harris MR, Zou JY, Taylor TC, Wahlby A, Jones TA |year=2004 |title=The Uppsala Electron-Density Server |journal=Acta Crystallographica D |volume= 60 |pages=2240–2249 |doi=10.1107/s0907444904013253}}</ref>]
* [http://swift.cmbi.ru.nl/gv/whatcheck/ What_Check software]
* [http://swift.cmbi.ru.nl/gv/whatcheck/ What_Check software]
* [http://www.ebi.ac.uk/thornton-srv/software/PROCHECK/ ProCheck software]
* [http://www.ebi.ac.uk/thornton-srv/software/PROCHECK/ ProCheck software]
* [http://www.biop.ox.ac.uk/coot/ Coot modeling software (built-in validation)<ref>{{Cite journal |author=Emsley P, Lohkamp B, Scott WG, Cowtan K |year=2010 |title=Features and development of Coot |journal=Acta Crystallographica D |volume= 66 |pages=486–501}}</ref>]
* [http://www.biop.ox.ac.uk/coot/ Coot modeling software (built-in validation)<ref>{{Cite journal |author=Emsley P, Lohkamp B, Scott WG, Cowtan K |year=2010 |title=Features and development of Coot |journal=Acta Crystallographica D |volume= 66 |pages=486–501 |doi=10.1107/s0907444910007493}}</ref>]
* [http://xray.bmc.uu.se/usf/ OOPS2, part of the Uppsala Software Factory]
* [http://xray.bmc.uu.se/usf/ OOPS2, part of the Uppsala Software Factory]
* [https://prosa.services.came.sbg.ac.at/prosa.php ProSA web service]
* [https://prosa.services.came.sbg.ac.at/prosa.php ProSA web service]
* [http://nihserver.mbi.ucla.edu/Verify_3D/ Verify-3D profile analysis]
* [http://nihserver.mbi.ucla.edu/Verify_3D/ Verify-3D profile analysis]
* [http://www.cmbi.ru.nl/pdb_redo/ PDB_REDO optimized X-ray structure models<ref>{{Cite journal |author=Joosten RP, Joosten K, Murshudov GN, Perrakis A |year=2012 |title=PDB_REDO: constructive validation, more than just looking for errors |journal=Acta Crystallographica D |volume= 68 |pages=484–496}}</ref>]
* [http://www.cmbi.ru.nl/pdb_redo/ PDB_REDO optimized X-ray structure models<ref>{{Cite journal |author=Joosten RP, Joosten K, Murshudov GN, Perrakis A |year=2012 |title=PDB_REDO: constructive validation, more than just looking for errors |journal=Acta Crystallographica D |volume= 68 |pages=484–496 |doi=10.1107/s0907444911054515}}</ref>]


==For NMR (Nuclear Magnetic Resonance)==
==For NMR (Nuclear Magnetic Resonance)==
Line 94: Line 94:
* [http://resprox.ca/ ResProx - protein model resolution-by-proxy]
* [http://resprox.ca/ ResProx - protein model resolution-by-proxy]
* [http://vadar.wishartlab.com/ VADAR - Volume, Area, Dihedral Angle Reporter]
* [http://vadar.wishartlab.com/ VADAR - Volume, Area, Dihedral Angle Reporter]
* [http://psvs-1_4-dev.nesg.org/ PSVS (Protein Structure Validation Server at the NESG)<ref>{{Cite journal |author=Huang YJ, Powers R and Montelione GT |year=2005 |title=Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics |journal=Journal of Biomolecular NMR |volume= 127 |pages=1665–1674}}</ref>]
* [http://psvs-1_4-dev.nesg.org/ PSVS (Protein Structure Validation Server at the NESG)<ref>{{Cite journal |author=Huang YJ, Powers R and Montelione GT |year=2005 |title=Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics |journal=Journal of Biomolecular NMR |volume= 127 |pages=1665–1674 |doi=10.1021/ja047109h}}</ref>]
* [http://code.google.com/p/cing/ CING (Common Interface for NMR structure Generation) software]
* [http://code.google.com/p/cing/ CING (Common Interface for NMR structure Generation) software]
* [http://www.ebi.ac.uk/thornton-srv/software/PROCHECK/ ProCheckNMR software<ref>{{Cite journal |author=Laskowski RA, Rullmannn JA, MacArthur MW, Kaptein R, Thornton JM |year=1996 |title=AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR |journal=Journal of the American Chemical Society |volume= 8 |pages=477–486}}</ref>]
* [http://www.ebi.ac.uk/thornton-srv/software/PROCHECK/ ProCheckNMR software<ref>{{Cite journal |author=Laskowski RA, Rullmannn JA, MacArthur MW, Kaptein R, Thornton JM |year=1996 |title=AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR |journal=Journal of the American Chemical Society |volume= 8 |pages=477–486 |doi=10.1007/bf00228148}}</ref>]
* [http://molprobity.biochem.duke.edu/ MolProbity (includes analyses for NMR)]
* [http://molprobity.biochem.duke.edu/ MolProbity (includes analyses for NMR)]
* [http://spin.niddk.nih.gov/bax/nmrserver/talos/ TALOS+ Software & Server(server for predicting protein backbone torsion angles from chemical shift)]
* [http://spin.niddk.nih.gov/bax/nmrserver/talos/ TALOS+ Software & Server(server for predicting protein backbone torsion angles from chemical shift)]
Line 110: Line 110:


==For Computational Biology==
==For Computational Biology==
It is difficult to do meaningful validation of an individual, purely computational, macromolecular model in the absence of experimental data for that molecule, because the model with the best geometry and conformational score may not be the one closest to the right answer. Therefore, much of the emphasis in validation of computational modeling is in assessment of the methods. To avoid bias and wishful thinking, double-blind prediction competitions have been organized, the original example of which (held every 2 years since 1994) is [[CASP]] (Critical Assessment of Structure Prediction) to evaluate predictions of 3D protein structure for newly solved [[x-ray crystallography|crystallographic]] or [[Nuclear magnetic resonance|NMR]] structures held in confidence until the end of the relevant competition.<ref>{{cite journal |author=Moult, J., ''et al.'' |year=1995 |title=A large-scale experiment to assess protein structure prediction methods |journal=Proteins |volume=23 |issue=3 |pages=ii–iv }}</ref> The major critierion for CASP evaluation is a weighted score called GDT-TS for the match of Calpha positions between the predicted and the experimental models.<ref>{{cite journal |doi=10.1093/nar/gkg571 |author=Zemla, A. |year=2003 |title=LGA: a method for finding 3D similarities in protein structures |journal=Nucleic Acids Research |volume=31 |issue=13 |pages=3370–4 |pmid=12824330 |pmc=168977 }}</ref>
It is difficult to do meaningful validation of an individual, purely computational, macromolecular model in the absence of experimental data for that molecule, because the model with the best geometry and conformational score may not be the one closest to the right answer. Therefore, much of the emphasis in validation of computational modeling is in assessment of the methods. To avoid bias and wishful thinking, double-blind prediction competitions have been organized, the original example of which (held every 2 years since 1994) is [[CASP]] (Critical Assessment of Structure Prediction) to evaluate predictions of 3D protein structure for newly solved [[x-ray crystallography|crystallographic]] or [[Nuclear magnetic resonance|NMR]] structures held in confidence until the end of the relevant competition.<ref>{{cite journal |author=Moult, J., ''et al.'' |year=1995 |title=A large-scale experiment to assess protein structure prediction methods |journal=Proteins |volume=23 |issue=3 |pages=ii–iv |doi=10.1002/prot.340230303}}</ref> The major critierion for CASP evaluation is a weighted score called GDT-TS for the match of Calpha positions between the predicted and the experimental models.<ref>{{cite journal |doi=10.1093/nar/gkg571 |author=Zemla, A. |year=2003 |title=LGA: a method for finding 3D similarities in protein structures |journal=Nucleic Acids Research |volume=31 |issue=13 |pages=3370–4 |pmid=12824330 |pmc=168977 }}</ref>


===Software and Websites===
===Software and Websites===

Revision as of 08:20, 8 May 2014

Structure validation concept: model of a protein (each ball is an atom), and magnified region with electron density data and 3 bright flags for problems

Macromolecular structure validation is the process of evaluating reliability for 3-dimensional atomic models of large biological molecules such as proteins and nucleic acids. These models, which provide 3D coordinates for each atom in the molecule (see example in the image), come from structural biology experiments such as x-ray crystallography[1] or nuclear magnetic resonance (NMR).[2] The validation has three aspects: 1) checking on the validity of the thousands to millions of measurements in the experiment; 2) checking how consistent the atomic model is with those experimental data; and 3) checking consistency of the model with known physical and chemical properties.

Proteins and nucleic acids are the workhorses of biology, providing the necessary chemical reactions, structural organization, growth, mobility, reproduction, and environmental sensitivity. Essential to their biological functions are the detailed 3D structures of the molecules and the changes in those structures. To understand and control those functions, we need accurate knowledge about the models that represent those structures, including their many strong points and their occasional weaknesses.

End-users of macromolecular models include clinicians, teachers and students, as well as the structural biologists themselves, journal editors and referees, experimentalists studying the macromolecules by other techniques, and theoreticians and bioinformaticians studying more general properties of biological molecules. Their interests and requirements vary, but all benefit greatly from a global and local understanding of the reliability of the models.

Historical summary

Macromolecular crystallography was preceded by the older field of small-molecule x-ray crystallography (for structures with less than a few hundred atoms). Small-molecule diffraction data extends to much higher resolution than feasible for macromolecules, and has a very clean mathematical relationship between the data and the atomic model. The residual, or R-factor, measures the agreement between the experimental data and the values back-calculated from the atomic model. For a well-determined small-molecule structure the R-factor is nearly as small as the uncertainty in the experimental data (well under 5%). Therefore that one test by itself provides most of the validation needed, but a number of additional consistency and methodology checks are done by automated software[3] as a requirement for small-molecule crystal structure papers submitted to the International Union of Crystallography (IUCr) journals such as Acta Crystallographica section B or C. Atomic coordinates of these small-molecule structures are archived and accessed through the Cambridge Structural Database (CSD)[4] or the Crystallography Open Database (COD).[5]

The first macromolecular validation software was developed around 1990, for proteins. It included Rfree cross-validation for model-to-data match,[6] bond length and angle parameters for covalent geometry,[7] and sidechain and backbone conformational criteria.[8][9][10] For macromolecular structures, the atomic models are deposited in the Protein Data Bank (PDB), still the single archive of this data. The PDB was established in the 1970s at Brookhaven National Laboratory,[11] moved in 2000 to the [1] (Research Collaboration for Structural Biology) centered at Rutgers,[12] and expanded in 2003 to become the wwPDB (worldwide Protein Data Bank),[13] with access sites in Europe ([2]) and Asia ([3]), and with NMR data handled at the BioMagResBank (BMRB) in Wisconsin.

Validation rapidly became standard in the field,[14] with further developments described below. *Obviously needs expansion*

A large boost was given to the applicability of comprehensive validation for both x-ray and NMR as of February 1, 2008, when the worldwide Protein Data Bank (wwPDB) made mandatory the deposition of experimental data along with atomic coordinates. In 2012 strong forms of validation are being adopted for wwPDB deposition from recommendations of the wwPDB Validation Task Force committees for x-ray crystallography,[15] for NMR, for SAXS (small-angle x-ray scattering), and for cryoEM (cryo-Electron Microscopy).[16]

Validation for Crystallography (X-ray and Neutron)

Overall Considerations

Global vs Local Criteria

Many evaluation criteria apply globally to an entire experimental structure, most notably the resolution, the anisotropy or incompleteness of the data, and the residual or R-factor that measures overall model-to-data match (see below). Those help a user choose the most accurate among related Protein Data Bank entries to answer their questions. Other criteria apply to individual residues or local regions in the 3D structure, such as fit to the local electron density map or steric clashes between atoms. Those are especially valuable to the structural biologist for making improvements to the model, and to the user for evaluating the reliability of that model right around the place they care about - such as a site of enzyme activity or drug binding. Both types of measures are very useful, but although global criteria are easier to state or publish, local criteria make the greatest contribution to scientific accuracy and biological relevance. As expressed in the Rupp textbook, "Only local validation, including assessment of both geometry and electron density, can give an accurate picture of the reliability of the structure model or any hypothesis based on local features of the model."[17]

What can be seen in low vs high resolution macromolecular crystal structures

Relationship to Resolution and B-factor

Data Validation

Structure Factors

Twinning

Model-to-Data Validation

Residuals and Rfree

Real-space Correlation

Model Validation

Geometry

[7][18][19]

Conformation (Dihedrals): Protein & RNA

The backbone and sidechain dihedral angles of protein and RNA have been shown to have specific combinations of angles which are allowed.

Sterics and Packing

Carbohydrates and Ligands

Improvement by Correcting Diagnosed Problems

Software and Websites

For NMR (Nuclear Magnetic Resonance)

Data Validation: Chemical Shifts, NOEs, RDCs

PROSESS. PROSESS (Protein Structure Evaluation Suite & Server) is the first web server that offered an assessment of protein structural models by NMR chemical shifts as well as NOEs, geometrical, and knowledge-based parameters.

AVS. Assignment validation suite (AVS) checks the chemical shifts list in BioMagResBank (BMRB) format for problems.

LACS. Linear analysis of chemical shifts is used for absolute referencing of chemical shift data.

Model-to-Data Validation

TALOS+. Predicts protein backbone torsion angles from chemical shift data. Frequently used to generate further restraints applied to a structure model during refinement.

Model Validation: As Above

NMR structural ensemble for PDB file 2K5D, with well-defined structure for the beta strands (arrows) and undefined, presumably highly mobile regions for the orange loop and the blue N-terminus

Dynamics: Core vs Loops, Tails, and Mobile Domains

One of the critical needs for NMR structural ensemble validation is to distinguish well-determined regions (those that have experimental data) from regions that are highly mobile and/or have no observed data. There are several current or proposed methods for making this distinction such as Random Coil Index, but so far the NMR community has not standardized on one.

Software and Websites

For Cryo-Electron Microscopy and Hybrid Methods

Software and Websites

For SAXS (Small-Angle X-ray Scattering)

For Computational Biology

It is difficult to do meaningful validation of an individual, purely computational, macromolecular model in the absence of experimental data for that molecule, because the model with the best geometry and conformational score may not be the one closest to the right answer. Therefore, much of the emphasis in validation of computational modeling is in assessment of the methods. To avoid bias and wishful thinking, double-blind prediction competitions have been organized, the original example of which (held every 2 years since 1994) is CASP (Critical Assessment of Structure Prediction) to evaluate predictions of 3D protein structure for newly solved crystallographic or NMR structures held in confidence until the end of the relevant competition.[25] The major critierion for CASP evaluation is a weighted score called GDT-TS for the match of Calpha positions between the predicted and the experimental models.[26]

Software and Websites

References

  1. ^ Rupp 2009
  2. ^ Cavanagh 2006
  3. ^ Spek AL (2003). "Single-crystal structure validation with the program PLATON". J. Applied Crystallography. 36: 7–13. doi:10.1107/S0021889802022112.
  4. ^ Allen FH (2002). "The Cambridge Structural Database: a quarter of a million crystal structures and rising". Acta Crystallographica. B 58: 380–388. doi:10.1107/S0108768102003890.
  5. ^ Grazulis S, Chateigner D, Downs RT, Yokochi AT, Quiros M, Lutterotti L, Manakova E, Butkus J, Moeck, P, Le Bail A (2009). "Crystallography Open Database – an open-access collection of crystal structures". Journal of Applied Crystallography. 42: 726–729. doi:10.1107/s0021889809016690.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  6. ^ Brunger AT (1992). "Free R value: a novel statistical quantity for assessing the accuracy of crystal structures". Nature (London). 355: 472–475. doi:10.1038/355472a0.
  7. ^ a b "Accurate bond and angle parameters for X-ray protein structure refinement". Acta Crystallographica. A 47: 392–400. 1991. {{cite journal}}: Cite uses deprecated parameter |authors= (help)
  8. ^ Ponder JW, Richards FM (1987). "Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes". J. Molecular Biology. 193: 775–791.
  9. ^ Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993). "PROCHECK: a program to check the stereochemical quality of protein structures". J. Applied Crystallography. 26: 283–291. doi:10.1107/s0021889892009944.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  10. ^ Hooft RWW, Vriend G, Sander C, Abola EE (1996). "Errors in protein structures". Nature (London). 381: 272. doi:10.1038/381272a0.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  11. ^ Bernstein FC, Koetzle TF, Williams GJB, Meyer EF Jr, Brice MD, Rodgers JR, Kennard 0, Shimanouchi T, Tasumi M (1977). "The Protein Data Bank: A computer-based archival file for macromolecular structures". J. Molecular Biology. 112: 535–542. doi:10.1016/s0022-2836(77)80200-3.{{cite journal}}: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)
  12. ^ Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000). "The Protein Data Bank". Nucleic Acids Research. 28: 235–242. doi:10.1093/nar/28.1.235. PMC 102472. PMID 10592235.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  13. ^ Berman HM, Henrick K, Nakamura H (2003). "Announcing the worldwide Protein Data Bank". Nature Structural & Molecular Biology. 10: 980. doi:10.1038/nsb1203-980. PMID 14634627.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  14. ^ Kleywegt GJ (2000). "Validation of protein crystal structures". Acta Crystallogr. D 56: 18–19.
  15. ^ Read RJ, Adams PD, Arendall WB III, Brunger AT, Emsley P, Joosten RP, Kleywegt GJ, Krissinel EB, Lutteke T, Otwinowski Z, Perrakis A, Richardson JS, Sheffler WH, Smith JL, Tickle IJ, Vriend G, Zwart PH (2011). "A New Generation of Crystallographic Validation Tools for the Protein Data Bank". Structure. 19: 1395–1412. doi:10.1016/j.str.2011.08.006. PMC 3195755. PMID 22000512.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  16. ^ Henderson R, Sali A, Baker ML, Carragher B, Devkota B, Downing KH, Egelman EH, Feng Z, Frank J, Grigorieff N, Jiang W, Ludtke SJ, Medalia O, Penczek PA, Rosenthal PB, Rossmann MG, Schmid MF, Schroeder GF, Steven AC, Stokes, DL, Westbrook JD, Wriggers W, Yang H, Young J, Berman HM, Chiu W, Kleywegt GJ, Lawson CL (2012). "Outcome of the First Electron Microscopy Validation Task Force Meeting". Structure. 20: 205–214. doi:10.1016/j.str.2011.12.014.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  17. ^ Rupp 2009, Chapter 13, Key Concepts
  18. ^ Gelbin A, Schneider B, Clowney L, Hsieh S-H, Olson WK, Berman HM (1996). "Geometric parameters in Nucleic Acids:Sugar and Phosphate Constituents". J Amer Chem Soc. 118: 519–529. doi:10.1021/ja9528846.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  19. ^ Schultze P, Feigon J (1997). "Chirality errors in nucleic acid structures". Nature (London). 387: 668.
  20. ^ Kleywegt G.J., Harris MR, Zou JY, Taylor TC, Wahlby A, Jones TA (2004). "The Uppsala Electron-Density Server". Acta Crystallographica D. 60: 2240–2249. doi:10.1107/s0907444904013253.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  21. ^ Emsley P, Lohkamp B, Scott WG, Cowtan K (2010). "Features and development of Coot". Acta Crystallographica D. 66: 486–501. doi:10.1107/s0907444910007493.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  22. ^ Joosten RP, Joosten K, Murshudov GN, Perrakis A (2012). "PDB_REDO: constructive validation, more than just looking for errors". Acta Crystallographica D. 68: 484–496. doi:10.1107/s0907444911054515.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  23. ^ Huang YJ, Powers R and Montelione GT (2005). "Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics". Journal of Biomolecular NMR. 127: 1665–1674. doi:10.1021/ja047109h.
  24. ^ Laskowski RA, Rullmannn JA, MacArthur MW, Kaptein R, Thornton JM (1996). "AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR". Journal of the American Chemical Society. 8: 477–486. doi:10.1007/bf00228148.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  25. ^ Moult, J.; et al. (1995). "A large-scale experiment to assess protein structure prediction methods". Proteins. 23 (3): ii–iv. doi:10.1002/prot.340230303. {{cite journal}}: Explicit use of et al. in: |author= (help)
  26. ^ Zemla, A. (2003). "LGA: a method for finding 3D similarities in protein structures". Nucleic Acids Research. 31 (13): 3370–4. doi:10.1093/nar/gkg571. PMC 168977. PMID 12824330.

Further reading

  • Cavanagh, John, Fairbrother, Wayne J., Palmer, Arthur G. III, Skelton, Nicholas J. (2006). Protein NMR Spectroscopy: Principles and Practice (2nd ed.). Academic Press. ISBN 0-12-164491-X.{{cite book}}: CS1 maint: multiple names: authors list (link)
  • Rupp, Bernhard (2009). Biomolecular Crystallography: Principles, Practice, and Application to Structural Biology. Garland Science. ISBN 0815340818. {{cite book}}: Invalid |ref=harv (help)