Talk:Protein structure

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Molecular and Cell Biology (Rated B-class, High-importance)
WikiProject icon This article is within the scope of the WikiProject Molecular and Cell Biology. To participate, visit the WikiProject for more information.
B-Class article B  This article has been rated as B-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.


The original text for this article was taken from: and claimed as "public domain". However on the Disclaimer linked from this page: it says:

Copyright Status
LLNL-authored documents are sponsored by the U.S. Department of Energy under Contract W-7405-Eng-48. Accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce these documents, or allow others to do so, for U.S. Government purposes. All documents available from this server may be protected under the U.S. and Foreign Copyright Laws. Permission to reproduce may be required.

It thus not clear that these pages can be used under public domain provisions, although the DOE (which funds LLNL) being a US govt. agency should, in theory, be covered under the blanket no-copyright rule for all federal US agencies. It concerns me that they have the statement: "or allow [...] for U.S. Government purposes". Perhaps they are more worried about the copyright status of text they didn't author. If a copyright-wonk could clarify this for us, it would be useful. Thanks. --Lexor|Talk 02:24, Oct 8, 2004 (UTC)

Too much amino acid content?[edit]

It seems like there is way too much detailed and redundant information about amino acids in this topic. This topic should focus on generic aspects of protein structure. There is already ample description of amino acid chemistry in the amino acid article. Anyone else have thoughts on this? Speedyboy 21:19, 31 January 2007 (UTC)

Dominant forces in 2° and 3° structure?[edit]

Hi, just a word about the recent edit specifying the molecular forces stabilizing different types of protein structure. Protein 2° structure is certainly defined by its hydrogen bonds, but it might be a little misleading to say that it is stabilized by its hydrogen bonds; if that were true, then peptides taken from a protein would adopt stable 2° structure, which is extremely rare. The reason is that water-amide hydrogen bonds are generally more favorable than amide-amide hydrogen bonds; that's why the 2° structure disintegrates when the 3° structure of a protein is unfolded by tension, pressure or cold denaturation.

Likewise, disulfide bonds are extremely rare in cytosolic proteins, since the cytosol is a reducing environment thanks to the high concentration of glutathione. With a few exceptions in thermophiles, disulfide bonds are found only in extracellular proteins such as secreted toxins, inhibitors, digestive enzymes, etc. So I hope you won't mind the changes I'm about to make in the edit; thanks for being understanding! :) Willow 17:35, 5 February 2007 (UTC)

Yes, it is stabilized by hydrogen bonds (~1.5 kcal/mol per a water-inaccessible H-bond. The only stable structures adopted by peptides (unless they are cyclic) are alpha-helices or beta-hairpins - precisely for that reason. The intramolecular H-bonds in proteins are "stronger" than dynamic H-bonds with water because protein structure represents solid state. When liquids or flexible polymers (like protein coil) are frozen, they gain enthalpy of fusion. The energy of protein H-bonds is a part of enthalpy of fusion.Biophys (talk) 01:27, 4 April 2009 (UTC)

Additional structural motives[edit]

I'm hardly searching for an explanation for the term beta-jelly-roll motive. Some help would be great!

Thx Monsterous mad max 14:23, 16 August 2007 (UTC)

Alpha Amino Acid picture[edit]

Could this be replaced with this one without losing information: ? —Preceding unsigned comment added by (talk) 23:42, 16 February 2009 (UTC)

Citation Needed for (avg)300 Residues[edit]

I'm not sure how wikipedia works (usually I just leech), but "However, the current estimate for the average protein length is around 300 residues." as found in the second paragraph of the intro is citation-needed as of Mar. 24/09, and I believe that a suitable citation may be found in "Protein length in eukaryotic and prokaryotic proteomes", written by Luciano Brocchieri and Samuel Karlin in 2005 (see especially Table 2). —Preceding unsigned comment added by (talk) 06:08, 24 March 2009 (UTC)

Thanks for the reference; it is a perfect one. I have added it to the article. I would like to reproduce Table 2 from this paper somewhere on Wikipedia. --Thorwald (talk) 16:24, 24 March 2009 (UTC)

beta loops[edit]

I'm unsure, but is a beta-loop the same as beta-bend (beta-turn)? according to my prof. a beta-bend is usually found in the outskirts of a protein between two beta-sheets. My book describes beta-bends as:

  • reverses the direction
  • found on the surface
  • often charged residues
  • connect anti-parallel b-sheets
  • stabilized by hydrogen and ionic bonds

I don't think the described beta-loops are a perfect match. I they aren't b-bend should be added. If they are, the b-loop part should be edited. But as I said, I'm unsure. Help?

--Nisse1337 (talk) 11:22, 25 March 2009 (UTC)

There is no such thing as "beta-loop". There are only beta-turns.Biophys (talk) 01:18, 4 April 2009 (UTC)

--Are you perhaps looking for the "Omega loop"?

Some problems here[edit]

Did anyone notice that primary structure was described several times in this article? Same with secondary structure.Biophys (talk) 01:17, 4 April 2009 (UTC)

.. Fix it then. Could someone who has time include a link to Conformational change and maybe improve that article too, as there should be something on this page about how phosphorylation or ligand binding can affect the protein structure and activate catalytic domains, etc. (talk) 12:06, 9 April 2009 (UTC)
Main problem: someone included (copy and paste) this segment of text which made this article unreadable. It also duplicates texts provided in other articles.Biophys (talk) 01:40, 15 April 2009 (UTC)

Primary structure of proteins[edit]

The primary structure of peptides and proteins refers to the linear number and order of the amino acids present. The convention for the designation of the order of amino acids is that the N-terminal end (i.e. the end bearing the residue with the free α-amino group) is to the left (and the number 1 amino acid) and the C-terminal end (i.e. the end with the residue containing a free α-carboxyl group) is to the right.

The proposal that proteins were linear chains of α-amino acids was made nearly simultaneously by two scientists at the same conference in 1902, the 74th meeting of the Society of German Scientists and Physicians, held in Karlsbad. Franz Hofmeister made the proposal in the morning, based on his observations of the biuret reaction in proteins. Hofmeister was followed a few hours later by Emil Fischer, who had amased a wealth of chemical details supporting the peptide-bond model. For completeness, the proposal that proteins contained amide linkages was made as early as 1882 by the French chemist E. Grimaux.

Despite these data and later evidence that proteolytically digested proteins yielded only oligopeptides, the idea that proteins were linear, unbranched polymers of amino acids was not accepted immediately. Some well-respected scientists such as William Astbury doubted that covalent bonds were strong enough to hold such long molecules together; they feared that thermal agitations would shake such long molecules asunder. Hermann Staudinger faced similar prejudices in the 1920s when he argued that rubber was composed of macromolecules.

Thus, several alternative hypotheses arose. The colloidal protein hypothesis stated that proteins were colloidal assemblies of smaller molecules. This hypothesis was disproven in the 1920s by ultracentrifugation measurements by The Svedberg that showed that proteins had a well-defined, reproducible molecular weight and by electrophoretic measurements by Arne Tiselius that indicated that proteins were single molecules. A second hypothesis, the cyclol hypothesis advanced by Dorothy Wrinch, proposed that the linear polypeptide underwent a chemical cyclol rearrangement C=O + HN C(OH)-N that crosslinked its backbone amide groups, forming a two-dimensional fabric. Other primary structures of proteins were proposed by various researchers, such as the diketopiperazine model of Emil Abderhalden and the pyrrol/piperidine model of Troensegaard in 1942. Although never given much credence, these alternative models were finally disproven when Frederick Sanger successfully sequenced insulin and by the crystallographic determination of myoglobin and hemoglobin by Max Perutz and John Kendrew.

The primary structure of a biological polymer to a large extent determines the three-dimensional shape known as the tertiary structure, but nucleic acid and protein folding are so complex that knowing the primary structure often doesn't help either to deduce the shape or to predict localized secondary structure, such as the formation of loops or helices. However, knowing the structure of a similar homologous sequence (for example a member of the same protein family) can unambiguously identify the tertiary structure of the given sequence. Sequence families are often determined by sequence clustering, and structural genomics projects aim to produce a set of representative structures to cover the sequence space of possible non-redundant sequences.

Secondary structure in proteins[edit]

The ordered array of amino acids in a protein confer regular conformational forms upon that protein. These conformations constitute the secondary structures of a protein. In general proteins fold into two broad classes of structure termed, globular proteins and fibrous proteins. Globular proteins are compactly folded and coiled, whereas, fibrous proteins are more filamentous or elongated. It is the partial double-bond character of the peptide bond that defines the conformations a polypeptide chain may assume. Within a single protein different regions of the polypeptide chain may assume different conformations determined by the primary sequence of the amino acids.

The α-Helix[edit]

The α-helix is a common secondary structure encountered in proteins of the globular class. About 35% of all amino acids in proteins are in α-helices, but in individual protein molecules this number ranges from 0 to 80%. The formation of the α-helix is spontaneous and is stabilized by H-bonding between amide nitrogens and carbonyl carbons of peptide bonds spaced four residues apart. This orientation of H-bonding produces a helical coiling of the peptide backbone such that the R-groups lie on the exterior of the helix and perpendicular to its axis. The average length of α-helices is 10.5 amino acids, but the range is from 4 to several dozens.

Not all amino acids favor the formation of the α-helix due to steric constraints of the R-groups. Amino acids such as A, D, E, I, L and M favor the formation of α-helices, whereas, G and P favor disruption of the helix. This is particularly true for P since it is a pyrrolidine based imino acid (HN=) whose structure significantly restricts movement about the peptide bond in which it is present, thereby, interfering with extension of the helix. The disruption of the helix is important as it introduces additional folding of the polypeptide backbone to allow the formation of globular proteins.


Whereas an α-helix is composed of a single linear array of helically disposed amino acids, β-sheets are composed of 2 or more different regions of stretches of at least 3-5 amino acids (average 4.5 amino acids). The folding and alignment of stretches of the polypeptide backbone aside one another to form β-sheets is stabilized by H-bonding between amide nitrogens and carbonyl carbons. However, the H-bonding residues are present in adjacently opposed stretches of the polypetide backbone as opposed to a linearly contiguous region of the backbone in the α-helix. β-sheets are said to be pleated. This is due to positioning of the α-carbons of the peptide bond which alternates above and below the plane of the sheet. β-sheets are either parallel or antiparallel. In parallel sheets adjacent peptide chains proceed in the same direction (i.e. the direction of N-terminal to C-terminal ends is the same), whereas, in antiparallel sheets adjacent chains are aligned in opposite directions. Most anti-parallel β-sheets are from stretches adjacent in the sequence, connected by just a short loop. On the other hand, parallel β-sheets are always separated by a longer stretch of amino acids. β-sheets can be depicted in ball and stick format or as ribbons in certain protein formats.

Ball and Stick Representation of a β-SheetRibbon Depiction of β-Sheet

Super-Secondary Structure[edit]

Many proteins contain an ordered organization of several adjacent elements of secondary structures that form distinct, commonly observed structural motifs larger than individual secondary structures but smaller than domains or subunits. They are often hypothesized to act as early steps in the process of protein folding. Examples include β-hairpins, helix hairpins, right-handed β-α-β loops, and the helix-turn-helix motifs of bacterial proteins that regulate transcription.

Tertiary Structure of Proteins[edit]

Tertiary structure refers to the complete three-dimensional structure of the polypeptide units of a given protein. Included in this description is the spatial relationship of different secondary structures to one another within a polypeptide chain and how these secondary structures themselves fold into the three-dimensional form of the protein. Secondary structures of proteins often constitute distinct domains. Therefore, tertiary structure also describes the relationship of different domains to one another within a protein. The interactions of different domains is governed by several forces: These include hydrogen bonding, hydrophobic interactions, electrostatic interactions, van der Waals forces and covalent bonding with use of disulfide bridges.

Quaternary Structure[edit]

Many proteins contain 2 or more different polypeptide chains that are held in association by the same non-covalent forces that stabilize the tertiary structures of proteins. Proteins with multiple polypetide chains are oligomeric proteins. The structure formed by monomer-monomer interaction in an oligomeric protein is known as quaternary structure.

Oligomeric proteins can be composed of multiple identical polypeptide chains or multiple distinct polypeptide chains. Proteins with identical subunits are termed homo-oligomers. Proteins containing several distinct polypeptide chains are termed hetero-oligomers.

Hemoglobin, the oxygen carrying protein of the blood, contains two α and two β subunits arranged with a quaternary structure in the form, α2β2. Hemoglobin is, therefore, a hetero-oligomeric protein(I. Shahid et al., 2008). LUCAS

Forces Controlling Protein Structure[edit]

Hydrogen Bonding[edit]

Polypeptides contain numerous proton donors and acceptors both in their backbone and in the R-groups of the amino acids. The environment in which proteins are found also contains ample H-bond donors and acceptors of the water molecule. H-bonding, therefore, occurs not only within and between polypeptide chains but with the surrounding aqueous medium.

Hydrophobic Forces[edit]

Proteins are composed of amino acids that contain either hydrophilic or hydrophobic R-groups. It is the nature of the interaction of the different R-groups with the aqueous environment that plays the major role in shaping protein structure. The spontaneous folded state of globular proteins is a reflection of a balance between the opposing energetics of H-bonding between hydrophilic R-groups and the aqueous environment and the repulsion from the aqueous environment by the hydrophobic R-groups. The hydrophobicity of certain amino acid R-groups tends to drive them away from the exterior of proteins and into the interior. This driving force restricts the available conformations into which a protein may fold.

Electrostatic Forces[edit]

Electrostatic forces are mainly of three types; charge-charge, charge-dipole and dipole-dipole. Typical charge-charge interactions that favor protein folding are those between oppositely charged R-groups such as K or R and D or E. A substantial component of the energy involved in protein folding is charge-dipole interactions. This refers to the interaction of ionized R-groups of amino acids with the dipole of the water molecule. The slight dipole moment that exist in the polar R-groups of amino acid also influences their interaction with water. It is, therefore, understandable that the majority of the amino acids found on the exterior surfaces of globular proteins contain charged or polar R-groups.

van der Waals Forces[edit]

There are both attractive and repulsive van der Waals forces that control protein folding. Attractive van der Waals forces involve the interactions among induced dipoles that arise from fluctuations in the charge densities that occur between adjacent uncharged non-bonded atoms. Repulsive van der Waals forces involve the interactions that occur when uncharged non-bonded atoms come very close together but do not induce dipoles. The repulsion is the result of the electron-electron repulsion that occurs as two clouds of electrons begin to overlap. Although van der Waals forces are extremely weak, relative to other forces governing conformation, it is the huge number of such interactions that occur in large protein molecules that make them significant to the folding of proteins.

Complex Protein Structures[edit]

Proteins also are found to be covalently conjugated with carbohydrates. These modifications occur following the synthesis (translation) of proteins and are, therefore, termed post-translational modifications. These forms of modification impart specialized functions upon the resultant proteins. Proteins covalently associated with carbohydrates are termed glycoproteins. Glycoproteins are of two classes, N-linked and O-linked, referring to the site of covalent attachment of the sugar moieties. N-linked sugars are attached to the amide nitrogen of the R-group of asparagine; O-linked sugars are attached to the hydroxyl groups of either serine or threonine and occasionally to the hydroxyl group of the modified amino acid, hydroxylysine.

There are extremely important glycoproteins found on the surface of erythrocytes. It is the variability in the composition of the carbohydrate portions of many glycoproteins and glycolipids of erythrocytes that determines blood group specificities. There are at least 100 blood group determinants, most of which are due to carbohydrate differences. The most common blood groups, A, B, and O, are specified by the activity of specific gene products whose activities are to incorporate distinct sugar groups onto RBC membrane glycoshpingolipids as well as secreted glycoproteins.

Structural complexes involving protein associated with lipid via noncovalent interactions are termed lipoproteins. The distinct roles of lipoproteins are described on the linked page. Their major function in the body is to aid in the storage transport of lipid and cholesterol.

Amino-Terminal Sequence Determination[edit]

Prior to sequencing peptides it is necessary to eliminate disulfide bonds within peptides and between peptides. Several different chemical reactions can be used in order to permit separation of peptide strands and prevent protein conformations that are dependent upon disulfide bonds. The most common treatments are to use either 2-mercaptoethanol or dithiothreitol (DTT). Both of these chemicals reduce disulfide bonds. To prevent reformation of the disulfide bonds the peptides are treated with iodoacetic acid in order to alkylate the free sulfhydryls.

There are three major chemical techniques for sequencing peptides and proteins from the N-terminus. These are the Sanger, Dansyl chloride and Edman techniques.

Sanger's Reagent
This sequencing technique utilizes the compound, 2,4-dinitrofluorobenzene (DNF) which reacts with the N-terminal residue under alkaline conditions. The derivatized amino acid can be hydrolyzed and will be labeled with a dinitrobenzene group that imparts a yellow color to the amino acid. Separation of the modified amino acids (DNP-derivative) by electrophoresis and comparison with the migration of DNP-derivative standards allows for the identification of the N-terminal amino acid.
Dansyl chloride
Like DNF, dansyl chloride reacts with the N-terminal residue under alkaline conditions. Analysis of the modified amino acids is carried out similarly to the Sanger method except that the dansylated amino acids are detected by fluorescence. This imparts a higher sensitivity into this technique over that of the Sanger method.
Edman degradation
The utility of the Edman degradation technique is that it allows for additional amino acid sequence to be obtained from the N-terminus inward. Using this method it is possible to obtain the entire sequence of peptides. This method utilizes phenylisothiocyanate to react with the N-terminal residue under alkaline conditions. The resultant phenylthiocarbamyl derivatized amino acid is hydrolyzed in anhydrous acid. The hydrolysis reaction results in a rearrangement of the released N-terminal residue to a phenylthiohydantoin derivative. As in the Sanger and Dansyl chloride methods, the N-terminal residue is tagged with an identifiable marker, however, the added advantage of the Edman process is that the remainder of the peptide is intact. The entire sequence of reactions can be repeated over and over to obtain the sequences of the peptide. This process has subsequently been automated to allow rapid and efficient sequencing of even extremely small quantities of peptide.
Name  (Residue) 3-letter
(%) E.C.
MW pK VdW volume
Alanine ALA A 13.0 71   67 H
Arginine ARG R 5.3 157 12.5 148 C+
Asparagine ASN N 9.9 114   96 P
Aspartate ASP D 9.9 114 3.9 91 C-
Cysteine CYS C 1.8 103   86 P
Glutamate GLU E 10.8 128 4.3 109 C-
Glutamine GLN Q 10.8 128   114 P
Glycine GLY G 7.8 57   48 N
Histidine HIS H 0.7 137 6.0 118 P,C+
Isoleucine ILE I 4.4 113   124 H
Leucine LEU L 7.8 113   124 H
Lysine LYS K 7.0 129 10.5 135 C+
Methionine MET M 3.8 131   124 H
Phenylalanine PHE F 3.3 147   135 H
Proline PRO P 4.6 97   90 H
Serine SER S 6.0 87   73 P
Threonine THR T 4.6 101   93 P
Tryptophan TRP W 1.0 186   163 P
Tyrosine TYR Y 2.2 163 10.1 141 P
Valine VAL V 6.0 99   105 H

This table is actually worse than a similar Table in amino acids. Please compare. This is a useless content fork.Biophys (talk) 01:44, 15 April 2009 (UTC)


I think it unwise to make so much use of the term 'residue' for 'amino acid'. It is bound to cause difficulties for readers who are not chemists. Macdonald-ross (talk) 17:10, 17 September 2012 (UTC)

Section entitled "Protein Structure Prediction Pipeline- SAHG, a comprehensive database of predicted structures of all human protein"[edit]

I've removed the following section for the moment (diff). Unfortunately it is an overly-detailed bullet point summary of a relatively niche paper (only cited 7 times since 2010) so isn't really appropriate for this page. If I think of a more specialist article that could benefit from a sentence on it I'll see if I can salvage some of the info, but I suspect it's a little too obscure. T.Shafee(Evo﹠Evo)talk 11:52, 14 December 2015 (UTC)

@MansiG123: Sorry to revert you again, but I really think that this information would be more appropriate in the SAHG article. The information you added is too specific for this page because this page is a general description of protein structure. SAHG is only one of a great many databases of equivalent notability. T.Shafee(Evo﹠Evo)talk 00:48, 15 December 2015 (UTC)

Assessment comment[edit]

The comment(s) below were originally left at Talk:Protein structure/Comments, and are posted here for posterity. Following several discussions in past years, these subpages are now deprecated. The comments may be irrelevant or outdated; if so, please feel free to remove this section.

high school/SAT biology content and important MCB overview; changed rating from high to top - tameeria 15:19, 18 February 2007 (UTC)

Last edited at 15:19, 18 February 2007 (UTC). Substituted at 03:29, 30 April 2016 (UTC)