Protein chemical shift prediction

Protein chemical shift prediction is a branch of biomolecular nuclear magnetic resonance spectroscopy that aims to accurately calculate protein chemical shifts from protein coordinates. Protein chemical shift prediction was first attempted in the late 1960s using semi-empirical methods applied to protein structures solved by X-ray crystallography.^[1] Since that time protein chemical shift prediction has evolved to employ much more sophisticated approaches including quantum mechanics, machine learning and empirically derived chemical shift hypersurfaces.^[1] The most recently developed methods exhibit remarkable precision and accuracy.

Protein chemical shifts

NMR chemical shifts are often called the mileposts of nuclear magnetic resonance spectroscopy. Chemists have used chemical shifts for more than 50 years as highly reproducible, easily measured parameters to map out the covalent structure of small organic molecules. Indeed, the sensitivity of NMR chemical shifts to the type and character of neighbouring atoms, combined with their reasonably predictable tendencies has made them invaluable for both deciphering and describing the structure of thousands of newly synthesized or newly isolated compounds^[1] ^[2] ^[3]^[4] The same sensitivity to a variety of important protein structural features has made protein chemical shifts equally valuable to protein chemists and biomolecular NMR spectroscopists.^[4] In particular, protein chemical shifts are sensitive not only to substituent or covalent atom effects (such as electronegativity, redox states or ring currents) but they are also sensitive to backbone torsion angles (i.e. secondary structure), hydrogen bonding, local atomic motions and solvent accessibility.

Importance of protein chemical shift prediction

Predicted or estimated protein chemical shifts can be used to assist with the chemical shift assignment process. This is especially true if a similar (or identical) protein structure has been solved by X-ray crystallography. In this case, the three-dimensional structure can be used to estimate what the NMR chemical shifts should be and thereby simplify the process of assigning the experimentally observed chemical shifts. Predicted/estimated protein chemical shifts can also be used to identify incorrect or mis-assignments, to correct mis-referenced or incorrectly referenced chemical shifts, to optimize protein structures via chemical shift refinement and to identify the relative contributions of different electronic or geometric effects to nucleus-specific shifts.^[1] Protein chemical shifts can also be used to identify secondary structures, to estimate backbone torsion angles, to determine the location of aromatic rings, to assess cysteine oxidation states, to estimate solvent exposure and to measure backbone flexibility.^[4]

Progress in chemical shift prediction programs

Significant progress in chemical shift prediction has been made through continuous improvements in our understanding of the key physico-chemical factors contributing to chemical shift changes. These improvements have also been helped along through significant computational advancements ^[5] ^[6] ^[7] ^[8] and the rapid expansion of biomolecular chemical shift databases ^[9] .^[10] Over the past four decades, at least three different methods for calculating or predicting protein chemical shifts have emerged. The first is based on using sequence/structure alignment against protein chemical shift databases, the second is based on directly calculating shifts from atomic coordinates, and the third is based on using a combination of the two approaches.^[1]^[4]

Predicting shifts via sequence homology: these are based on the simple observation that similar protein sequences share similar structures and similar chemical shifts^[1]^[3]
Predicting shifts from coordinate data / structure:
- Semi-classical methods: employ empirical equations derived from classical physics and experimental data^[1]
- Quantum mechanical (QM) methods: employ density functional theory (DFT)^[1]^[2]
- Empirical methods: rely on using chemical shift ‘‘hypersurfaces" or related "structure/shift" tables^[1]
Hybrid Methods: combining the above two methods^[1]

The emergence of hybrid prediction methods

By early 2000, several research groups realized that protein chemical shifts could be more efficiently and accurately calculated by combining different methods together as shown in Figure 1. This led to the development of several programs and web servers that rapidly calculate protein chemical shifts when provided with protein coordinate data.^[1] These “hybrid” programs, along with some of their features and URLs, are listed below in Table 1.

Summary of protein chemical shift prediction programs

Table 1: Currently available protein chemical shift prediction programs
Name	Method	Website
SHIFTCALC^[11]	Hybrid – empirical chemical shift hypersurfaces in combination with semi-classical calculations	https://archive.today/20140324204821/http://nmr.group.shef.ac.uk/NMR/mainpage.html
SHIFTS^[12]	Hybrid – QM chemical shift hypersurfaces combined with semi-classical calculations	http://casegroup.rutgers.edu/qshifts/qshifts.htm
CheSHIFT^[13]	QM calculated chemical shift hypersurfaces	http://cheshift.com/
SHIFTX^[2]	Hybrid – empirical chemical shift hypersurfaces in combination with semi-classical calculations	http://shiftx.wishartlab.com
PROSHIFT^[14]	Neural network model using atomic parameters and sequence information	http://www.meilerlab.org/index.php/servers/show?s_id=9
SPARTA^[15]	Hybrid - sequence and shift matching to a databases combined with semi-classical calculations	http://spin.niddk.nih.gov/bax/software/SPARTA/index.html
SPARTA+^[16]	Hybrid - sequence and shift matching to a databases combined with semi-classical calculations and artificial neural network	http://spin.niddk.nih.gov/bax/software/SPARTA+/
CAMSHIFT^[17]	Distance-based method in combination with parameterized polynomial expansion	https://web.archive.org/web/20140109151911/http://www-vendruscolo.ch.cam.ac.uk/camshift/camshift.php
SHIFTX2^[4]	Hybrid – machine learning method using atomic parameters and combination with semi-classical calculations (SHIFTX+). Finally, using ensemble rules with sequence homology based prediction (SHIFTY+)	http://www.shiftx2.ca http://www.wishartlab.com

Performance comparison of modern protein chemical shift prediction programs

This table (Figure 2) lists the correlation coefficients between the experimentally observed backbone chemical shifts and the calculated/predicted backbone shifts for different chemical shift predictors using an identical test set of 61 test proteins.

Coverage and speed

Different methods have different levels of coverage and rates of calculation. Some methods only calculate or predict chemical shifts for backbone atoms (6 atom types). Some calculate chemical shifts for backbone and certain side chain atoms (C and N only) and still others are able to calculate shifts for all atoms (40 atom types). For chemical shift refinement there is a need for rapid calculation as thousands of structures are generated during a molecular dynamics or simulated annealing run and their chemical shifts must be calculated equally rapidly.

Program	No. of atom types predicted	Speed (seconds/100 residues)
SHIFTX	27	0.59
SPARTA	6 (backbone only)	17.92
SPARTA+	6 (backbone only)	2.47
CamShift	6 (backbone only)	0.91
SHIFTS	31	3.66
PROSHIFT	40	12.82
SHIFTX2	40	2.10

All the computational speed tests for SPARTA, SPARTA+, SHIFTS, CamShift, SHIFTX and SHIFTX2 were performed on the same computer using the same set of proteins. The calculation speed reported for PROSHIFT is based on the response rate of its web server.^[4]

References

^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k Wishart, DS (Feb 2011). "Interpreting protein chemical shift data". Progress in Nuclear Magnetic Resonance Spectroscopy. 58 (1–2): 62–87. Bibcode:2011PNMRS..58...62W. doi:10.1016/j.pnmrs.2010.07.004. PMID 21241884.
^ ^a ^b ^c Neal, S; Nip AM; Zhang H; Wishart DS (Jul 2003). "Rapid and accurate calculation of protein 1H, 13C and 15 N chemical shifts". Journal of Biomolecular NMR. 26 (3): 215–240. doi:10.1023/A:1023812930288. PMID 12766419. S2CID 29425090.
^ ^a ^b Wishart, DS; Watson, M.S.; Boyko, R.F.; Sykes, B.D. (Dec 1997). "Automated 1H and 13C Chemical Shift Prediction Using the BioMagResBank". Journal of Biomolecular NMR. 10 (4): 329–336. doi:10.1023/A:1018373822088. PMID 9460240. S2CID 6004996.
^ ^a ^b ^c ^d ^e ^f Han, Beomsoo; Yifeng Liu; Simon Ginzinger; David Wishart (May 2011). "SHIFTX2: significantly improved protein chemical shift prediction". Journal of Biomolecular NMR. 50 (1): 43–57. doi:10.1007/s10858-011-9478-4. PMC 3085061. PMID 21448735.
^ Williamson, MP; Asakura, T (Jul 1997). "Protein Chemical Shifts". Protein NMR Techniques. Methods in Molecular Biology. Vol. 60. pp. 53–69. doi:10.1385/0-89603-309-0:53. ISBN 978-0-89603-309-2. PMID 9276246.
^ Case, DA (Oct 1998). "The use of chemical shifts and their anisotropies in biomolecular structure determination". Current Opinion in Structural Biology. 8 (5): 624–630. doi:10.1016/S0959-440X(98)80155-3. PMID 9818268.
^ Case, DA (Apr 2000). "Interpretation of chemical shifts and coupling constants in macromolecules". Current Opinion in Structural Biology. 10 (2): 197–203. doi:10.1016/S0959-440X(00)00068-3. PMID 10753812.
^ Wishart, DS; Case, DA (2001). "Use of Chemical Shifts in Macromolecular Structure Determination". Nuclear Magnetic Resonance of Biological Macromolecules Part A. Methods in Enzymology. Vol. 338. pp. 3–34. doi:10.1016/s0076-6879(02)38214-4. ISBN 9780121822392. PMID 11460554.
^ Seavey, B.R.; Farr, E.A.; Westler, W.M. & Markley, J.L. (1991). "A relational database for sequence-specific protein NMR data". Journal of Biomolecular NMR. 1 (3): 217–236. doi:10.1007/BF01875516. PMID 1841696. S2CID 33755287.
^ Zhang, H; Neal, S. & Wishart, D.S. (Mar 2003). "RefDB: A database of uniformly referenced protein chemical shifts". J. Biomol. NMR. 25 (3): 173–195. doi:10.1023/A:1022836027055. PMID 12652131. S2CID 12786364.
^ Iwadate, M; Asakura T; Williamson MP (1999). "C-alpha and C-beta carbon-13 chemical shifts in proteins from an empirical database". J Biomol NMR. 13 (3): 199–211. doi:10.1023/A:1008376710086. PMID 10212983. S2CID 43991686.
^ Xu, XP; Case DA (2001). "Automated prediction of 15N, 13Calpha, 13Cbeta and 13C′ chemical shifts in proteins using a density functional database". J Biomol NMR. 21 (4): 321–333. doi:10.1023/A:1013324104681. PMID 11824752. S2CID 665000.
^ Vila, JA; Arnautova YA; Martin OA (2009). "Quantum-mechanics-derived 13Calpha chemical shift server (CheShift) for protein structure validation". Proc Natl Acad Sci USA. 106 (40): 16972–16977. Bibcode:2009PNAS..10616972V. doi:10.1073/pnas.0908833106. PMC 2761357. PMID 19805131.
^ Meiler, J (2003). "PROSHIFT: protein chemical shift prediction using artificial neural networks". J Biomol NMR. 26 (1): 25–37. doi:10.1023/A:1023060720156. PMID 12766400. S2CID 16360110.
^ Shen, Y; Bax A (2007). "Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology". J Biomol NMR. 38 (4): 289–302. doi:10.1007/s10858-007-9166-6. PMID 17610132. S2CID 12886163.
^ Shen, Yang; Ad Bax (2010). "SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network". J Biomol NMR. 48 (1): 13–22. doi:10.1007/s10858-010-9433-9. PMC 2935510. PMID 20628786.
^ Kohlhoff, KJ; Robustelli P; Cavalli A; Salvatella X; Vendruscolo M (2009). "Fast and accurate predictions of protein NMR chemical shifts from interatomic distances". J Am Chem Soc. 131 (39): 13894–13895. CiteSeerX 10.1.1.476.7079. doi:10.1021/ja903772t. PMID 19739624.

[pmid21241884-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k Wishart, DS (Feb 2011). "Interpreting protein chemical shift data". Progress in Nuclear Magnetic Resonance Spectroscopy. 58 (1–2): 62–87. Bibcode:2011PNMRS..58...62W. doi:10.1016/j.pnmrs.2010.07.004. PMID 21241884.

[pmid12766419-2] Neal, S; Nip AM; Zhang H; Wishart DS (Jul 2003). "Rapid and accurate calculation of protein 1H, 13C and 15 N chemical shifts". Journal of Biomolecular NMR. 26 (3): 215–240. doi:10.1023/A:1023812930288. PMID 12766419. S2CID 29425090.

[pmid9460240-3] Wishart, DS; Watson, M.S.; Boyko, R.F.; Sykes, B.D. (Dec 1997). "Automated 1H and 13C Chemical Shift Prediction Using the BioMagResBank". Journal of Biomolecular NMR. 10 (4): 329–336. doi:10.1023/A:1018373822088. PMID 9460240. S2CID 6004996.

[pmid21448735-4] ^ ^a ^b ^c ^d ^e ^f Han, Beomsoo; Yifeng Liu; Simon Ginzinger; David Wishart (May 2011). "SHIFTX2: significantly improved protein chemical shift prediction". Journal of Biomolecular NMR. 50 (1): 43–57. doi:10.1007/s10858-011-9478-4. PMC 3085061. PMID 21448735.

[pmid9276246-5] Williamson, MP; Asakura, T (Jul 1997). "Protein Chemical Shifts". Protein NMR Techniques. Methods in Molecular Biology. Vol. 60. pp. 53–69. doi:10.1385/0-89603-309-0:53. ISBN 978-0-89603-309-2. PMID 9276246.

[pmid9818268-6] Case, DA (Oct 1998). "The use of chemical shifts and their anisotropies in biomolecular structure determination". Current Opinion in Structural Biology. 8 (5): 624–630. doi:10.1016/S0959-440X(98)80155-3. PMID 9818268.

[pmid10753812-7] Case, DA (Apr 2000). "Interpretation of chemical shifts and coupling constants in macromolecules". Current Opinion in Structural Biology. 10 (2): 197–203. doi:10.1016/S0959-440X(00)00068-3. PMID 10753812.

[pmid11460554-8] Wishart, DS; Case, DA (2001). "Use of Chemical Shifts in Macromolecular Structure Determination". Nuclear Magnetic Resonance of Biological Macromolecules Part A. Methods in Enzymology. Vol. 338. pp. 3–34. doi:10.1016/s0076-6879(02)38214-4. ISBN 9780121822392. PMID 11460554.

[pmid1841696-9] Seavey, B.R.; Farr, E.A.; Westler, W.M. & Markley, J.L. (1991). "A relational database for sequence-specific protein NMR data". Journal of Biomolecular NMR. 1 (3): 217–236. doi:10.1007/BF01875516. PMID 1841696. S2CID 33755287.

[pmid12652131-10] Zhang, H; Neal, S. & Wishart, D.S. (Mar 2003). "RefDB: A database of uniformly referenced protein chemical shifts". J. Biomol. NMR. 25 (3): 173–195. doi:10.1023/A:1022836027055. PMID 12652131. S2CID 12786364.

[pmid10212983-11] Iwadate, M; Asakura T; Williamson MP (1999). "C-alpha and C-beta carbon-13 chemical shifts in proteins from an empirical database". J Biomol NMR. 13 (3): 199–211. doi:10.1023/A:1008376710086. PMID 10212983. S2CID 43991686.

[pmid11824752-12] Xu, XP; Case DA (2001). "Automated prediction of 15N, 13Calpha, 13Cbeta and 13C′ chemical shifts in proteins using a density functional database". J Biomol NMR. 21 (4): 321–333. doi:10.1023/A:1013324104681. PMID 11824752. S2CID 665000.

[pmid19805131-13] Vila, JA; Arnautova YA; Martin OA (2009). "Quantum-mechanics-derived 13Calpha chemical shift server (CheShift) for protein structure validation". Proc Natl Acad Sci USA. 106 (40): 16972–16977. Bibcode:2009PNAS..10616972V. doi:10.1073/pnas.0908833106. PMC 2761357. PMID 19805131.

[pmid12766400-14] Meiler, J (2003). "PROSHIFT: protein chemical shift prediction using artificial neural networks". J Biomol NMR. 26 (1): 25–37. doi:10.1023/A:1023060720156. PMID 12766400. S2CID 16360110.

[pmid17610132-15] Shen, Y; Bax A (2007). "Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology". J Biomol NMR. 38 (4): 289–302. doi:10.1007/s10858-007-9166-6. PMID 17610132. S2CID 12886163.

[pmid20628786-16] Shen, Yang; Ad Bax (2010). "SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network". J Biomol NMR. 48 (1): 13–22. doi:10.1007/s10858-010-9433-9. PMC 2935510. PMID 20628786.

[pmid19739624-17] Kohlhoff, KJ; Robustelli P; Cavalli A; Salvatella X; Vendruscolo M (2009). "Fast and accurate predictions of protein NMR chemical shifts from interatomic distances". J Am Chem Soc. 131 (39): 13894–13895. CiteSeerX 10.1.1.476.7079. doi:10.1021/ja903772t. PMID 19739624.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]