Protein structure: Difference between revisions

Content deleted Content added

Inline

Revision as of 04:48, 10 November 2010

Proteins are an important class of biological macromolecules present in all organisms. All proteins are polymers of amino acids. Classified by their physical size, proteins are nanoparticles (definition: 1-100 nm). Each protein polymer – also known as a polypeptide – consists of a sequence of 20 different L-α-amino acids, also referred to as residues. For chains under 40 residues the term peptide is frequently used instead of protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations, driven by a number of non-covalent interactions such as hydrogen bonding, ionic interactions, Van Der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, and dual polarisation interferometry to determine the structure of proteins.

Protein structures range in size from tens to several thousand residues ^[1] Very large aggregates can be formed from protein subunits: for example, many thousand actin molecules assemble into a microfilament.

A protein may undergo reversible structural changes in performing its biological function. The alternative structures of the same protein are referred to as different conformations, and transitions between them are called conformational changes.

==Protein covalent structure and stereochemistry==pavan

An α-amino acid. The C_αH atom is omitted in the diagram.

Protein amino acids are combined into a single polypeptide chain in a condensation reaction. This reaction is catalysed by the ribosome in a process known as translation.

Protein covalent structure and stereochemistry

Protein amino acids are combined into a single polypeptide chain in a condensation reaction. This reaction is catalysed by the ribosome in a process known as translation.

Amino acid residues

Each α-amino acid consists of a backbone part that is present in all the amino acid types, and a side chain that is unique to each type of residue. An exception from this rule is proline, where the hydrogen atom is replaced by a bond to the side chain. Because the carbon atom is bound to four different groups it is chiral, however only one of the isomers occur in biological proteins. Glycine however, is not chiral since its side chain is a hydrogen atom. A simple mnemonic for correct L-form is "CORN": when the C_α atom is viewed with the H in front, the residues read "CO-R-N" in a clockwise direction.

The 20 naturally occurring amino acids have different physical and chemical properties, including their electrostatic charge, pKa, hydrophobicity, size and specific functional groups. These properties play a major role in molding protein structure. to

The peptide bond

The peptide bond tend to be planar due to the delocalization of the electrons from the double bond. The rigid peptide dihedral angle, ω (the bond between C₁ and N) is always close to 180 degrees. The dihedral angles phi φ (the bond between N and Cα) and psi ψ (the bond between Cα and C₁) can have a certain range of possible values. These angles are the internal degrees of freedom of a protein, they control the protein's conformation. They are restrained by geometry to allowed ranges typical for particular secondary structure elements, and represented in a Ramachandran plot. A few important bond lengths are given in the table below.

Peptide bond	Average length	Single bond	Average length	Hydrogen bond	Average (±30)
Ca - C	153 pm	C - C	154 pm	O-H --- O-H	280 pm
C - N	133 pm	C - N	148 pm	N-H --- O=C	290 pm
N - Ca	146 pm	C - O	143 pm	O-H --- O=C	280 pm

Side-chain conformation

The atoms along the side chain are named with Greek letters in Greek alphabetical order: α, β, γ, δ, є, and so on. C_α refers to the carbon atom of the backbone closest to the carbonyl group of that amino acid, C_β the second closest and so on. The dihedral angles around the bonds between these atoms are named χ1, χ2, χ3, etc. The dihedral angle of the first movable atom of the side chain, $\gamma$ , defined as N-C $\alpha$ -C $\beta$ - $X\gamma$ , is named χ1. Side chains tend to adopt different staggered conformations called gauche(-), trans, and gauche(+), which corresponds to rotation angles of 60°, 180°, and -60°, respectively, around the sp3-sp3 bonds.

The diversity of side-chain conformations is often expressed in rotamer libraries. A rotamer library is a collection of rotamers for each residue type. Side-chain dihedral angles are not evenly distributed, but for most side chain types, the $\chi$ angles occur in tight clusters around certain values. Rotamer libraries therefore are usually derived from statistical analysis of side-chain conformations in known structures of proteins by clustering observed conformations or by dividing dihedral angle space into bins, and determining an average conformation in each bin^[2]^[3].

The peptide bond

The peptide bond tend to be planar due to the delocalization of the electrons from the double bond. The rigid peptide dihedral angle, ω (the bond between C₁ and N) is always close to 180 degrees. The dihedral angles phi φ (the bond between N and Cα) and psi ψ (the bond between Cα and C₁) can have a certain range of possible values. These angles are the internal degrees of freedom of a protein, they control the protein's conformation. They are restrained by geometry to allowed ranges typical for particular secondary structure elements, and represented in a Ramachandran plot. A few important bond lengths are given in the table below.

Peptide bond	Average length	Single bond	Average length	Hydrogen bond	Average (±30)
Ca - C	153 pm	C - C	154 pm	O-H --- O-H	280 pm
C - N	133 pm	C - N	148 pm	N-H --- O=C	290 pm
N - Ca	146 pm	C - O	143 pm	O-H --- O=C	280 pm

Side-chain conformation

The atoms along the side chain are named with Greek letters in Greek alphabetical order: α, β, γ, δ, є, and so on. C_α refers to the carbon atom of the backbone closest to the carbonyl group of that amino acid, C_β the second closest and so on. The dihedral angles around the bonds between these atoms are named χ1, χ2, χ3, etc. The dihedral angle of the first movable atom of the side chain, $\gamma$ , defined as N-C $\alpha$ -C $\beta$ - $X\gamma$ , is named χ1. Side chains tend to adopt different staggered conformations called gauche(-), trans, and gauche(+), which corresponds to rotation angles of 60°, 180°, and -60°, respectively, around the sp3-sp3 bonds.

The diversity of side-chain conformations is often expressed in rotamer libraries. A rotamer library is a collection of rotamers for each residue type. Side-chain dihedral angles are not evenly distributed, but for most side chain types, the $\chi$ angles occur in tight clusters around certain values. Rotamer libraries therefore are usually derived from statistical analysis of side-chain conformations in known structures of proteins by clustering observed conformations or by dividing dihedral angle space into bins, and determining an average conformation in each bin^[2]^[4].

Levels of protein structure

There are four distinct levels of protein structure.

Primary structure

The primary structure refers to the sequence of the different amino acids of the peptide or protein. The primary structure is held together by covalent or peptide bonds, which are made during the process of protein biosynthesis or translation. The two ends of the polypeptide chain are referred to as the carboxyl terminus (C-terminus) and the amino terminus (N-terminus) based on the nature of the free group on each extremity. Counting of residues always starts at the N-terminal end (NH₂-group), which is the end where the amino group is involved in a peptide bond. The primary structure of a protein is determined by the gene corresponding to the protein. A specific sequence of nucleotides in DNA is transcribed into mRNA, which is read by the ribosome in a process called translation. The sequence of a protein is unique to that protein, and defines the structure and function of the protein. The sequence of a protein can be determined by methods such as Edman degradation or tandem mass spectrometry. Often however, it is read directly from the sequence of the gene using the genetic code. Post-translational modifications such as disulfide formation, phosphorylations and glycosylations are usually also considered a part of the primary structure, and cannot be read from the gene.

Secondary structure

An alpha-helix with hydrogen bonds (yellow dots)

Secondary structure refers to highly regular local sub-structures. Two main types of secondary structure, the alpha helix and the beta strand, were suggested in 1951 by Linus Pauling and coworkers.^[5]. These secondary structures are defined by patterns of hydrogen bonds between the main-chain peptide groups. They have a regular geometry, being constrained to specific values of the dihedral angles ψ and φ on the Ramachandran plot. Both the alpha helix and the beta-sheet represent a way of saturating all the hydrogen bond donors and acceptors in the peptide backbone. Some parts of the protein are ordered but do not form any regular structures. They should not be confused with random coil, an unfolded polypeptide chain lacking any fixed three-dimensional structure. Several sequential secondary structures may form a "supersecondary unit".^[6]

Tertiary structure

Tertiary structure refers to three-dimensional structure of a single protein molecule. The alpha-helices and beta-sheets are folded into a compact globule. The folding is driven by the non-specific hydrophobic interactions (the burial of hydrophobic residues from water), but the structure is stable only when the parts of a protein domain are locked into place by specific tertiary interactions, such as salt bridges, hydrogen bonds, and the tight packing of side chains and disulfide bonds. The disulfide bonds are extremely rare in cytosolic proteins, since the cytosol is generally a reducing environment.

Quaternary structure

Quaternary structure is a larger assembly of several protein molecules or polypeptide chains, usually called subunits in this context. The quaternary structure is stabilized by the same non-covalent interactions and disulfide bonds as the tertiary structure. Complexes of two or more polypeptides (i.e. multiple subunits) are called multimers. Specifically it would be called a dimer if it contains two subunits, a trimer if it contains three subunits, and a tetramer if it contains four subunits. The subunits are frequently related to one another by symmetry operations, such as a 2-fold axis in a dimer. Multimers made up of identical subunits are referred to with a prefix of "homo-" (e.g. a homotetramer) and those made up of different subunits are referred to with a prefix of "hetero-" (e.g. a heterotetramer, such as the two alpha and two beta chains of hemoglobin). Many proteins do not have the quaternary structure and function as monomers.

Domains, motifs, and folds in protein structure

Protein are frequently described as consisting from several structural units.

A structural domain is an element of the protein's overall structure that is self-stabilizing and often folds independently of the rest of the protein chain. Many domains are not unique to the protein products of one gene or one gene family but instead appear in a variety of proteins. Domains often are named and singled out because they figure prominently in the biological function of the protein they belong to; for example, the "calcium-binding domain of calmodulin". Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeras.

The structural and sequence motifs refer to short segments of protein three-dimensional structure or amino acid sequence that were found in a large number of different proteins.

The supersecondary structure refers to a specific combination of secondary structure elements, such as beta-alpha-beta units or helix-turn-helix motif. Some of them may be also referred to as structural motifs.

Protein fold refers to the general protein architecture, like helix bundle, beta-barrel, Rossman fold or different "folds" provided in the Structural Classification of Proteins database^[7].

Despite the fact that there are about 100,000 different proteins expressed in eukaryotic systems, there are many fewer different domains, structural motifs and folds. This is partly a consequence of evolution, since genes or parts of genes can be doubled or moved around within the genome. This means that, for example, a protein domain might be moved from one protein to another thus giving the protein a new function. Because of these mechanisms, pathways and mechanisms tend to be reused in several different proteins.

Protein folding

An unfolded polypeptide folds into its characteristic three-dimensional structure from random coil.

Protein structure determination

Around 90% of the protein structures available in the Protein Data Bank have been determined by X-ray crystallography. This method allows one to measure the 3D density distribution of electrons in the protein (in the crystallized state) and thereby infer the 3D coordinates of all the atoms to be determined to a certain resolution. Roughly 9% of the known protein structures have been obtained by Nuclear Magnetic Resonance techniques. The secondary structure composition can be determined via circular dichroism or dual polarisation interferometry. Cryo-electron microscopy has recently become a means of determining protein structures to high resolution (less than 5 angstroms or 0.5 nanometer) and is anticipated to increase in power as a tool for high resolution work in the next decade. This technique is still a valuable resource for researchers working with very large protein complexes such as virus coat proteins and amyloid fibers.

Structure classification

Protein structures can be classified based on their similarity or a common evolutionary origin. SCOP and CATH databases provide two different structural classifications of proteins.

Computational prediction of protein structure

The generation of a protein sequence is much easier than the determination of a protein structure. However, the structure of a protein gives much more insight in the function of the protein than its sequence. Therefore, a number of methods for the computational prediction of protein structure from its sequence have been developed^[8]. Ab initio prediction methods use just the sequence of the protein. Threading and Homology Modeling methods can build a 3D model for a protein of unknown structure from experimental structures of evolutionary related proteins.

Protein structure related software

There are software to aid researchers working on, often overlapping, different aspects of protein structure. The most basic functionality is providing structure visualization. Analysis of protein structure can be facilitated by software that aligns structures. In the absence of existing structures for a given protein sequence, there are methods to predict or to model the structure of such sequences based on known protein structures. And given models of known or predicted structures, one can use software to verify them for errors, predict protein conformational changes, or predict substrate binding sites.

References

^ Brocchieri L, Karlin S (10 June 2005). "Protein length in eukaryotic and prokaryotic proteomes". Nucleic Acids Research. 33 (10): 3390–3400. doi:10.1093/nar/gki615. PMC 1150220. PMID 15951512.
^ ^a ^b Dunbrack, RL (2002). "Rotamer Libraries in the 21st Century". Curr. Opin. Struct. Biol. 12 (4): 431–440. doi:10.1016/S0959-440X(02)00344-5. PMID 12163064.
^ Dunbrack Rotamer Libraries
^ Dunbrack Rotamer Libraries
^ Pauling L, Corey RB, Branson HR (1951). "The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain". Proc Natl Acad Sci USA. 37 (4): 205–211. doi:10.1073/pnas.37.4.205. PMC 1063337. PMID 14816373.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Chiang YS, Gelfand TI, Kister AE, Gelfand IM (2007). "New classification of supersecondary structures of sandwich-like proteins uncovers strict patterns of strand assemblage". Proteins. 68 (4): 915–921. doi:10.1002/prot.21473. PMID 17557333.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Govindarajan S, Recabarren R, Goldstein RA. (17 September 1999). "Estimating the total number of protein folds". Proteins. 35 (4): 408–414. doi:10.1002/(SICI)1097-0134(19990601)35:4<408::AID-PROT4>3.0.CO;2-A. PMID 10382668. {{cite journal}}: Unknown parameter |unused_data= ignored (help)CS1 maint: multiple names: authors list (link)
^ Zhang Y (2008). "Progress and challenges in protein structure prediction". Curr Opin Struct Biol. 18 (3): 342–348. doi:10.1016/j.sbi.2008.02.004. PMC 2680823. PMID 18436442. Template:Entrez Pubmed.

External links

Wikis

PDBWiki — A discussion forum for macromolecular structures (see PDBWiki)
Proteopedia — Annotation of protein structures and other biomolecules
TOPSAN — Annotation of protein structures in Structural genomics

Servers

SSS Database — super-secondary structure protein database
SPROUTS (Structural Prediction for pRotein fOlding UTility System)
ProSA-web — a web service for the recognition of errors in experimentally or theoretically determined protein structures
NQ-Flipper — checks for unfavorable rotamers of Asn and Gln residues in protein structures
WHAT IF servers — checks nearly 200 aspects of protein structure, like packing, geometry, unfavourable rotamers in general of for Asn, Gln, and His especially, strange water molecules, backbone conformations, atom nomenclature, symmetry parameters, etc.

Template:Link GA Template:Link GA

[Brocchieri2005-1] Brocchieri L, Karlin S (10 June 2005). "Protein length in eukaryotic and prokaryotic proteomes". Nucleic Acids Research. 33 (10): 3390–3400. doi:10.1093/nar/gki615. PMC 1150220. PMID 15951512.

[Rotamers21stCentury-2] Dunbrack, RL (2002). "Rotamer Libraries in the 21st Century". Curr. Opin. Struct. Biol. 12 (4): 431–440. doi:10.1016/S0959-440X(02)00344-5. PMID 12163064.

[3] Dunbrack Rotamer Libraries

[4] Dunbrack Rotamer Libraries

[Pauling1951-5] Pauling L, Corey RB, Branson HR (1951). "The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain". Proc Natl Acad Sci USA. 37 (4): 205–211. doi:10.1073/pnas.37.4.205. PMC 1063337. PMID 14816373.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[ChiangYS2007-6] Chiang YS, Gelfand TI, Kister AE, Gelfand IM (2007). "New classification of supersecondary structures of sandwich-like proteins uncovers strict patterns of strand assemblage". Proteins. 68 (4): 915–921. doi:10.1002/prot.21473. PMID 17557333.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Govindarajan-7] Govindarajan S, Recabarren R, Goldstein RA. (17 September 1999). "Estimating the total number of protein folds". Proteins. 35 (4): 408–414. doi:10.1002/(SICI)1097-0134(19990601)35:4<408::AID-PROT4>3.0.CO;2-A. PMID 10382668. {{cite journal}}: Unknown parameter |unused_data= ignored (help)CS1 maint: multiple names: authors list (link)

[zhang2008-8] Zhang Y (2008). "Progress and challenges in protein structure prediction". Curr Opin Struct Biol. 18 (3): 342–348. doi:10.1016/j.sbi.2008.02.004. PMC 2680823. PMID 18436442. Template:Entrez Pubmed.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

@@ Line 8: / Line 8: @@
 ==Protein covalent structure and stereochemistry==pavan
+[[Image:a-amino-acid.png|thumb|150px|An α-amino acid. The C<sub>α</sub>H atom is omitted in the diagram.]]
+Protein amino acids are combined into a single [[polypeptide chain]] in a [[condensation reaction]]. This reaction is [[catalysis|catalysed]] by the [[ribosome]] in a process known as [[peptide biosynthesis|translation]].
+==Protein covalent structure and stereochemistry==
 [[Image:a-amino-acid.png|thumb|150px|An α-amino acid. The C<sub>α</sub>H atom is omitted in the diagram.]]
 Protein amino acids are combined into a single [[polypeptide chain]] in a [[condensation reaction]]. This reaction is [[catalysis|catalysed]] by the [[ribosome]] in a process known as [[peptide biosynthesis|translation]].
@@ Line 20: / Line 24: @@
 The 20 naturally occurring amino acids have different [[Proteinogenic amino acid|physical and chemical properties]], including their electrostatic charge, pKa, [[Hydrophobicity scale|hydrophobicity]], size and specific functional groups. These properties play a major role in molding protein structure. to
+===The peptide bond===
+[[Image:2-amino-acids.png|thumb|200px|Two amino acids]]
+[[Image:fipsi.png|thumb|200px|Bond angles for ψ and ω]]
+The [[peptide bond]] tend to be planar due to the delocalization of the [[electron]]s from the double bond. The rigid peptide [[dihedral angle]], ω (the bond between C<sub>1</sub> and N) is always close to 180 degrees.  The dihedral angles phi φ (the bond between N and Cα) and psi ψ (the bond between Cα and C<sub>1</sub>) can have a certain range of possible values. These angles are the [[Degrees of freedom (mechanics)|internal degrees of freedom]] of a protein, they control the protein's conformation. They are restrained by geometry to allowed ranges typical for particular secondary structure elements, and represented in a [[Ramachandran plot]]. A few important [[bond length]]s are given in the table below.
+<div align="left">
+{| border="2" style="border-collapse: collapse;"
+|-
+| Peptide bond
+| Average length
+| Single bond
+| Average length
+| Hydrogen bond
+| Average (±30)
+|-
+| C<font face="Symbol">a -</font> C
+| 153 [[picometer|pm]]
+| C - C
+| 154 pm
+| O-H --- O-H
+| 280 pm
+|-
+| C - N
+| 133 pm
+| C - N
+| 148 pm
+| N-H --- O=C
+| 290 pm
+|-
+| N - C<font face="Symbol">a</font>
+| 146 pm
+| C - O
+| 143 pm
+| O-H --- O=C
+| 280 pm
+|}
+</div>
+===Side-chain conformation===
+The atoms along the side chain are named with Greek letters in Greek alphabetical order: α, β, γ, δ, є, and so on. C<sub>α</sub> refers to the carbon atom of the backbone closest to the carbonyl group of that amino acid, C<sub>β</sub> the second closest and so on. The dihedral angles around the bonds between these atoms are named χ1, χ2, χ3, etc. The dihedral angle of the first movable atom of the side chain, <math>\gamma</math></sub>, defined as N-C<math>\alpha</math>-C<math>\beta</math>-<math>X\gamma</math>, is named χ1. Side chains tend to adopt different [[staggered conformation]]s called ''gauche(-)'', ''trans'', and ''gauche(+)'', which corresponds to rotation angles of 60°, 180°, and -60°, respectively, around the sp3-sp3 bonds.
+The diversity of side-chain conformations is often expressed in rotamer libraries. A rotamer library is a collection of rotamers for each residue type. Side-chain dihedral angles are not evenly distributed, but for most side chain types, the <math>\chi</math> angles occur in tight clusters around certain values. Rotamer libraries therefore are usually derived from statistical analysis of side-chain conformations in known structures of proteins by clustering observed conformations or by dividing dihedral angle space into bins, and determining an average conformation in each bin<ref name="Rotamers21stCentury">{{Cite journal|author=Dunbrack, RL |journal=Curr. Opin. Struct. Biol. |year=2002 |volume=12 |issue=4 |pages=431–440 |title=Rotamer Libraries in the 21st Century |pmid=12163064 |doi=10.1016/S0959-440X(02)00344-5}}</ref><ref>[http://dunbrack.fccc.edu/bbdep Dunbrack Rotamer Libraries]</ref>.
 ===The peptide bond===

v t e Proteins
Processes	Protein biosynthesis Post-translational modification Protein folding Protein targeting Proteome Protein methods
Structures	Protein structure Protein structural domains Proteasome
Types	List of proteins Membrane protein Globular protein Globulin Edestin Albumin Fibrous protein Chromoprotein Photoreceptor protein Biliprotein Phycobiliprotein Phytochrome Lipocalin

v t e Protein structural analysis
High resolution	Cryo-electron microscopy X-ray crystallography NMR Electron crystallography EPR
Medium resolution	Fiber diffraction Mass spectrometry SAXS
Spectroscopic	NMR Circular dichroism Dual-polarization interferometry Absorbance Fluorescence Fluorescence anisotropy
Translational Diffusion	Analytical ultracentrifugation Size exclusion chromatography Light scattering NMR
Rotational Diffusion	Fluorescence anisotropy Flow birefringence Dielectric relaxation NMR
Chemical	Hydrogen–deuterium exchange Site-directed mutagenesis Chemical modification
Thermodynamic	Equilibrium unfolding
Computational	Protein structure prediction Molecular docking
←Tertiary structure Quaternary structure→

v t e Biomolecular structure
Protein structure	Primary Secondary Tertiary Quaternary Determination Prediction Design Thermodynamics
Nucleic acid structure	Primary Secondary Tertiary Quaternary Determination Prediction Design Thermodynamics
See also	Protein Protein domain Protein engineering Proteasome Nucleic acid DNA RNA Structural motif Nucleic acid double helix

Revision as of 04:48, 10 November 2010

Protein covalent structure and stereochemistry

Amino acid residues

The peptide bond

Side-chain conformation

The peptide bond

Side-chain conformation

Levels of protein structure

Primary structure

Secondary structure

Tertiary structure

Quaternary structure

Domains, motifs, and folds in protein structure

Protein folding

Protein structure determination

Structure classification

Computational prediction of protein structure

Protein structure related software

See also

References

Further reading

External links

Wikis

Servers