Because proteins are such large molecules, there are severe computational limits on the simulated timescales of their behaviour when modeled in all-atom detail. The millisecond regime for all-atom simulations was not reached until 2010, and it is still not possible to fold all real proteins on a computer. Simplification in lattice proteins is twofold: each whole residue (amino acid) is modeled as a single "bead" or "point" of a finite set of types (usually only two), and each residue is restricted to be placed on vertices of a (usually cubic) lattice. To guarantee the connectivity of the protein chain, adjacent residues on the backbone must be placed on adjacent vertices of the lattice. Steric constraints are expressed by imposing that no more than one residue can be placed on the same lattice vertex. Simplifications of this kind significantly reduce the computational effort in handling the model, although even in this simplified scenario the protein folding problem is NP-complete.
Different versions of lattice proteins may adopt different types of lattice (typically square and triangular ones), in two or three dimensions, but it has been shown that generic lattices can be used and handled via a uniform approach.
Lattice proteins are made to resemble real proteins by introducing an energy function, a set of conditions which specify the interaction energy between beads occupying adjacent lattice sites. The energy function mimics the interactions between amino acids in real proteins, which include steric, hydrophobic and hydrogen bonding effects. The beads are divided into types, and the energy function specifies the interactions depending on the bead type, just as different types of amino acids interact differently. One of the most popular lattice models, the hydrophobic-polar model (HP model), features just two bead types—hydrophobic (H) and polar (P)—and mimics the hydrophobic effect by specifying a favorable interaction between H beads.
For any sequence in any particular structure, an energy can be rapidly calculated from the energy function. For the simple HP model, this is an enumeration of all the contacts between H residues that are adjacent in the structure but not in the chain. Most researchers consider a lattice protein sequence protein-like only if it possesses a single structure with an energetic state lower than in any other structure. This is the energetic ground state, or native state. The relative positions of the beads in the native state constitute the lattice protein's tertiary structure. Lattice proteins do not have genuine secondary structure; however, some researchers have claimed that they can be extrapolated onto real protein structures which do include secondary structure, by appealing to the same law by which the phase diagrams of different substances can be scaled onto one another (the theorem of corresponding states).
By varying the energy function and the bead sequence of the chain (the primary structure), effects on the native state structure and the kinetics of folding can be explored, and this may provide insights into the folding of real proteins. In particular, lattice models have been used to investigate the energy landscapes of proteins, i.e. the variation of their internal free energy as a function of conformation.
A lattice is a set of orderly points that are connected by "edges". These points are called vertices and are connected to a certain number other vertices in the lattice by edges. The number of vertices each individual vertex is connected to is called the coordination number of the lattice, and it can be scaled up or down by changing the shape or dimension (2-dimensional to 3-dimensional, for example) of the lattice. This number is important in shaping the characteristics of the lattice protein because it controls the number of other residues allowed to be adjacent to a given residue. It has been shown that for most proteins the coordination number of the lattice used should fall between 3 and 20, although most commonly used lattices have coordination numbers at the lower end of this range.
Lattice shape is an important factor in the accuracy of lattice protein models. Changing lattice shape can dramatically alter the shape of the energetically favorable conformations. It can also add unrealistic constraints to the protein structure such as in the case of the parity problem where in square and cubic lattices residues of the same parity (odd or even numbered) cannot make hydrophobic contact. It has also been reported that triangular lattices yield more accurate structures than other lattice shapes when compared to crystallographic data. To combat the parity problem, several researchers have suggested using triangular lattices when possible, as well as a square matrix with diagonals for theoretical applications where the square matrix may be more appropriate. Hexagonal lattices with diagonals have also been suggested as a way to combat the parity problem.
The hydrophobic-polar protein model is the original lattice protein model. It was first proposed by Dill et. al. in 1985 as a way to overcome the significant cost and difficulty of predicting protein structure, using only the hydrophobicity of the amino acids in the protein to predict the protein structure. It is considered to be the paradigmatic lattice protein model. The method was able to quickly give an estimate of protein structure by representing proteins as "short chains on a 2D square lattice" and has since become known as the hydrophobic-polar model. It breaks the protein folding problem into three separate problems: modeling the protein conformation, defining the energetic properties of the amino acids as they interact with one another to find said conformation, and developing an efficient algorithm for the prediction of these conformations. It is done by classifying amino acids in the protein as either hydrophobic or polar and assuming that the protein is being folded in an aqueous environment. The lattice statistical model seeks to recreate protein folding by minimizing the free energy of the contacts between hydrophobic amino acids. Hydrophobic amino acid residues are predicted to group around each other, while hydrophilic residues interact with the surrounding water.
Problems and alternative models
The simplicity of the hydrophobic-polar model has caused it to have several problems that people have attempted to correct with alternative lattice protein models. Chief among these problems is the issue of degeneracy, which is when there is more than one minimum energy conformation for the modeled protein, leading to uncertainty about which conformation is the native one. Attempts to address this include the HPNX model which classifies amino acids as hydrophobic (H), positive (P), negative (N), or neutral (X) according to the charge of the amino acid, adding additional parameters to reduce the number of low energy conformations and allowing for more realistic protein simulations. Another model is the Crippen model which uses protein characteristics taken from crystal structures to inform the choice of native conformation.
Another issue with lattice models is that they generally don't take into account the space taken up by amino acid side chains, instead considering only the α-carbon. The side chain model addresses this by adding a side chain to the vertex adjacent to the α-carbon.
- Lau KF, Dill KA (1989). "A lattice statistical mechanics model of the conformational and sequence spaces of proteins". Macromolecules. 22 (10): 3986–97. doi:10.1021/ma00200a030.
- Voelz VA, Bowman GR, Beauchamp K, Pande VS (February 2010). "Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39)". Journal of the American Chemical Society. 132 (5): 1526–8. doi:10.1021/ja9090353. PMC 2835335. PMID 20070076.
- Bechini A (2013). "On the characterization and software implementation of general protein lattice models". PLOS One. 8 (3): e59504. doi:10.1371/journal.pone.0059504. PMC 3612044. PMID 23555684.
- Berger B, Leighton T (1998). "Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete". Journal of Computational Biology. 5 (1): 27–40. doi:10.1089/cmb.1998.5.27. PMID 9541869.
- Dubey SP, Kini NG, Balaji S, Kumar MS (2018). "A Review of Protein Structure Prediction Using Lattice Model". Critical Reviews in Biomedical Engineering. 46 (2): 147–162. doi:10.1615/critrevbiomedeng.2018026093. PMID 30055531.
- Dill KA (March 1985). "Theory for the folding and stability of globular proteins". Biochemistry. 24 (6): 1501–9. doi:10.1021/bi00327a032. PMID 3986190.
- Su SC, Lin CJ, Ting CK (December 2010). An efficient hybrid of hill-climbing and genetic algorithm for 2D triangular protein structure prediction. Bioinformatics and Biomedicine Workshops (BIBMW), 2010 IEEE International Conference. IEEE. pp. 51–56. doi:10.1109/BIBMW.2010.5703772. ISBN 978-1-4244-8303-7.
- Onuchic JN, Wolynes PG, Luthey-Schulten Z, Socci ND (April 1995). "Toward an outline of the topography of a realistic protein-folding funnel". Proceedings of the National Academy of Sciences of the United States of America. 92 (8): 3626–30. doi:10.1073/pnas.92.8.3626. PMC 42220. PMID 7724609.
- Moreno-Hernández S, Levitt M (June 2012). "Comparative modeling and protein-like features of hydrophobic-polar models on a two-dimensional lattice". Proteins. 80 (6): 1683–93. doi:10.1002/prot.24067. PMC 3348970. PMID 22411636.
- Backofen R, Will S, Bornberg-Bauer E (March 1999). "Application of constraint programming techniques for structure prediction of lattice proteins with extended alphabets". Bioinformatics. 15 (3): 234–42. doi:10.1093/bioinformatics/15.3.234. PMID 10222411.
- Crippen GM (April 1991). "Prediction of protein folding from amino acid sequence over discrete conformation spaces". Biochemistry. 30 (17): 4232–7. doi:10.1021/bi00231a018. PMID 2021616.
- Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS (April 1995). "Principles of protein folding--a perspective from simple exact models". Protein Science. 4 (4): 561–602. doi:10.1002/pro.5560040401. PMC 2143098. PMID 7613459.