Circular permutation in proteins: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Andorsch (talk | contribs)
updating figure legend to new version of the image
Quantum7 (talk | contribs)
Full rewrite of the article with text from the PLoS:CB topic page (doi:10.1371/journal.pcbi.1002445 CC-BY-2.5). A full history of intermediate changes is available at http://topicpages.ploscompbiol.org/wiki/Circular_permutation_in_proteins
Line 1: Line 1:
[[Image:Circular Permutation In Proteins.svg|thumb|Schematic representation of a circular permutation in two proteins. The first protein (outer circle) has the sequence a-b-c. After the permutation the second protein (inner circle) has the sequence c-a-b. The letters N and C indicate the location of the amino- and carboxy-termini of the protein sequences and how their positions change relative to each other. ]]
'''Circular permutation''' is a process during [[evolution]] that changes the order of [[amino acids]] in a [[protein sequence]], resulting in a [[protein structure]] with different connectivity, but overall similar three dimensional shape. As a consequence of the circular permutation, the [[N terminus|N-terminus]] of one [[protein]] shows significant [[sequence similarity]] to the [[C terminus|C-terminus]] of the other and vice versa. Artificially created permutations have been used for various purposes in [[protein engineering]] and [[Protein design|design]]. One of the first naturally occurring circular permutations identified was the swaposin family which are circularly permuted versions of saposins.{{PMID|7610480}}


'''Circular permutation''' describes a type of relationship between proteins, whereby the proteins have a changed order of [[Amino acids|amino acids]] in their [[Peptide sequence|protein sequence]], such that the sequence of the first portion of one protein (adjacent to the [[N-terminus|N-terminus]]) is related to that of the second portion of the other protein (near its [[C terminus|C-terminus]]), and vice versa. This is directly analogous to the mathematical notion of a [[cyclic permutation|cyclic permutation]] over the set of residues in a protein.
[[Image:Concanavalin A vs Lectin.png|thumb| Two proteins that are related by a circular permutation: [[Concanavalin_a|Concanavalin A]] (left), from {{PDB|3cna}}, and Peanut Lectin (right), from {{PDB|2pel}}, which is homologous to Favin. The termini of the proteins are highlighted by blue and green spheres, and the sequence of residues is indicated by the gradient from blue (N-terminus) to green (C-terminus). The 3-dimensional fold of the two proteins is highly similar, however the N and C- termini are located on different positions of the protein.<ref name="Cunningham">{{cite journal |author=Cunningham et al |year=1976 |title=Favin versus concanavalin A: Circularly permuted amino acid sequences |journal=PNAS |volume= 76|issue=7 |pages=3218–3222 |pmid=16592676 |doi=10.1073/pnas.76.7.3218 |pmc=383795}}</ref>]]

Circular permutation can be the result of [[Evolution|evolutionary]] events, [[Post_translational_modification|post-translational modifications]], or [[Protein_engineering|artificially engineered]] mutations. The result is a [[Protein structure|protein structure]] with different connectivity, but overall similar three-dimensional (3D) shape. The [[Homology_(biology)|homology]] between portions of the proteins can be established by observing similar sequences between N- and C-terminal portions of the two proteins, structural similarity, or other methods.

==History==

[[Image:Concanavalin A vs Lectin.png|thumb|Two proteins that are related by a circular permutation. [[Concanavalin_a|Concanavalin A]] (left), from the Protein Data Bank ({{PDB|3cna}}), and peanut lectin (right), from {{PDB|2pel}}, which is homologous to favin. The termini of the proteins are highlighted by blue and green spheres, and the sequence of residues is indicated by the gradient from blue (N-terminus) to green (C-terminus). The 3D fold of the two proteins is highly similar; however, the N- and C- termini are located on different positions of the protein.<ref name="Cunningham"/>]]

In 1979, Bruce Cunningham and his colleagues discovered the first instance of a circularly permuted protein in nature.<ref name="Cunningham">{{cite pmid|16592676}}</ref> After determining the peptide sequence of the [[Lectin|lectin]] protein favin, they noticed its similarity to a known protein - [[Concanavalin A|concanavalin A]] - except that the ends were circularly permuted. Later work confirmed the circular permutation between the pair<ref>{{cite pmid|3782132}}</ref> and showed that concanavalin A is permuted [[Posttranslational modification|post-translationally]]<ref>{{cite pmid|3965973}}</ref> through cleavage and an unusual protein ligation.<ref name="Bowles1988"/>

After the discovery of a natural circularly permuted protein, researchers looked for a way to emulate this process. In 1983, David Goldenberg and Thomas Creighton were able to create a circularly permuted version of a protein by [[Chemical ligation|chemically ligating]] the termini to create a [[Cyclic peptide|cyclic protein]], then introducing new termini elsewhere using [[trypsin|trypsin]].<ref name="Goldenberg">{{cite pmid|6188846}}</ref> In 1989, Karolin Luger and her colleagues introduced a genetic method for making circular permutations by carefully fragmenting and ligating DNA.<ref name="Luger1989">{{cite pmid|2643160}}</ref> This method allowed for permutations to be introduced at arbitrary sites, and is still used today to design circularly permuted proteins in the lab.

Despite the early discovery of post-translational circular permutations and the suggestion of a possible genetic mechanism for evolving circular permutants, it was not until 1995 that the first circularly permuted pair of genes were discovered. [[Saposin|Saposins]] are a class of proteins involved in [[Sphingolipid|sphingolipid]] catabolism and [[Lipid|lipid]] antigen presentation in humans. [[Christopher Ponting|Christopher Ponting]] and Robert Russell identified a circularly permuted version of a saposin inserted into plant [[aspartic proteinase|aspartic proteinase]], which they nicknamed [[Plant-specific insert|swaposin]].<ref name="Russell">{{cite pmid|7610480}}</ref> Saposin and swaposin were the first known case of two natural genes related by a circular permutation.

Hundreds of examples of protein pairs related by a circular permutation were subsequently discovered in nature or produced in the laboratory. The [http://sarst.life.nthu.edu.tw/cpdb/ Circular Permutation Database]<ref name=CircularPermutationDatabase>Circular Permutation Database. [http://sarst.life.nthu.edu.tw/cpdb/ http://sarst.life.nthu.edu.tw/cpdb/] Accessed 16 February, 2012.</ref> contains 2,238 circularly permuted protein pairs with known structures, and many more are known without structures.<ref>{{cite pmid|18842637}}</ref> The CyBase database collects proteins that are cyclic, some of which are permuted variants of cyclic wild-type proteins.<ref>{{cite pmid|20564021}}</ref> SISYPHUS is a database that contains a collection of hand-curated manual alignments of proteins with non-trivial relationships, several of which have circular permutations.<ref name=”Andreeva”>{{cite pmid|17068077}}</ref>

{{-}}


==Evolution==
==Evolution==


There are two main models that are currently being used to explain the evolution of circularly permuted proteins: ''permutation by duplication'' and ''fission and fusion''. The two models have compelling examples supporting them, but the relative contribution of each model in evolution is still under debate.<ref name="Weiner06">{{cite pmid|16431849}}</ref> Other, less common, mechanisms have been proposed, such as "cut and paste"<ref name="Bujnicki02">{{cite pmid|11914127}}</ref> or "[[Exon_shuffling|exon shuffling]]".
A model that can explain how circular permutations can occur during [[evolution]] is [[gene duplication]] of a precursor [[gene]].<ref name="Jeltsch">{{cite journal |author=Jeltsch A |year=1999 |title=Circular permutations in the molecular evolution of DNA methyltransferases |journal=J. Mol. Evol. |volume= 49|issue=1 |pages=161–164 |pmid=10368444 |doi=10.1007/PL00006529}}</ref> If both genes become fused this leads to a tandem protein. The [[Directionality (molecular biology)|5' and 3’]] part of the gene can get lost again for example by insertion of a [[stop codon]].

===Permutation by Duplication===

[[Image:Permutation_by_Duplication.svg‎|thumb|left|The permutation by duplication mechanism for producing a circular permutation. First, a gene is duplicated in place. Next, start and stop codons are introduced, resulting in a circularly permuted gene. ]]

The earliest model proposed for the evolution of circular permutations is the permutation by duplication mechanism.<ref name="Cunningham"/> In this model, a precursor gene first undergoes a duplication and fusion to form a large [[tandem repeat|tandem repeat]]. Next, [[Genetic code#Start/stop codons|start and stop codons]] are introduced at corresponding locations in the duplicated gene, removing redundant sections of the protein.

One surprising prediction of the permutation by duplication mechanism is that intermediate permutations can occur. For instance, the duplicated version of the protein should still be functional, since otherwise evolution would quickly select against such proteins. Likewise, partially duplicated intermediates where only one terminus was truncated should be functional. Such intermediates have been extensively documented in protein families such as [[DNA methyltransferase|DNA methyltransferases]].<ref name="Jeltsch">{{cite pmid|10368444}}</ref>


====Saposin and Swaposin====

[[Image:Saposin Swaposin.svg ‎|thumb|right|Suggested relationship between saposin and swaposin. They could have evolved from a similar gene.<ref>{{cite pmid|7610480}}</ref> Both consist of 4 alpha helices with the order of helices being permuted relative to each other.]]

An example for permutation by duplication is the relationship between saposin and swaposin. [[ Prosaposin|Saposins]] are highly conserved [[Glycoprotein|glycoproteins]] that consist of an approximately 80 amino acid residue long protein forming a four [[Alpha helix|alpha helical]] structure. They have a nearly identical placement of cysteine residues and glycosylation sites. The [[cDNA|cDNA]] sequence that codes for saposin is called [[Prosaposin|prosaposin]]. It is a precursor for four cleavage products, the saposins A, B, C, and D. The four saposin domains most likely arose from two tandem duplications of an ancestral gene.<ref>{{cite pmid|11734895}}</ref> This repeat suggests a mechanism for the evolution of the relationship with the [[Plant-specific insert|plant-specific insert]] (PSI). The PSI is a domain exclusively found in plants, consisting of approximately 100 residues and found in plant [[Aspartic_proteinase|aspartic proteases]].<ref>{{cite pmid|7925961}}</ref> It belongs to the saposin-like protein family (SAPLIP) and has the N- and C- termini "swapped", such that the order of helices is 3-4-1-2 compared with saposin, thus leading to the name "swaposin".<ref name="Russell"/> For a review on functional and structural features of saposin-like proteins, see Bruhn (2005).<ref name="Bruhn05">{{cite pmid|15992358}}</ref>

{{-}}

===Fission and Fusion===

[[Image:Fission-fusion (genetics).svg|thumb|left|The fission and fusion mechanism of circular permutation. Two separate genes arise (potentially from the fission of a single gene). If the genes fuse together in different orders in two orthologues, a circular permutation occurs.]]

Another model for the evolution of circular permutations is the fission and fusion model. The process starts with two partial proteins. These may represent two independent polypeptides (such as two parts of a [[heterodimer|heterodimer]]), or may have originally been halves of a single protein that underwent a [[wiktionary:fission|fission]] event to become two polypeptides.

The two proteins can later fuse together to form a single polypeptide. Regardless of which protein comes first, this fusion protein may show similar function. Thus, if a fusion between two proteins occurs twice in evolution (either between [[Homology_(biology)#Paralogy|paralogues]] within the same species or between [[Homology_(biology)#Orthology|orthologues]] in different species) but in a different order, the resulting fusion proteins will be related by a circular permutation.

Evidence for a particular protein having evolved by a fission and fusion mechanism can be provided by observing the halves of the permutation as independent polypeptides in related species, or by demonstrating experimentally that the two halves can function as separate polypeptides.<ref name=lee11>{{cite pmid|21173271}}</ref>

====Transhydrogenases====

[[Image:Transhydrogenase Circular Permutations.svg|thumb|right|Transhydrogenases in various organisms can be found in three different domain arrangements. In cattle, the three domains are arranged sequentially. In the bacteria E. coli, Rb. capsulatus, and R. rubrum, the transhydrogenase consists of two or three subunits. Finally, transhydrogenase from the protist E. tenella consists of a single subunit that is circularly permuted relative to cattle transhydrogenase.<ref name="Hatefi1996">{{cite pmid|8647343}}</ref>]]

An example for the fission and fusion mechanism can be found in [[NAD(P)%2B_transhydrogenase_(B-specific)|nicotinamide nucleotide transhydrogenases]].<ref name="Hatefi1996"/> These are [[Cell_membrane|membrane]]-bound [[Enzyme|enzymes]] that catalyze the transfer of a hydride ion between [[NAD%2B|NAD(H)]] and [[NADPH|NADP(H)]] in a reaction that is coupled to [[Proton_pump|transmembrane proton translocation]]. They consist of three major functional units (I, II, and III) that can be found in different arrangement in [[Bacteria|bacteria]], [[Protozoa|protozoa]], and higher [[Eukaryote|eukaryotes]]. Phylogenetic analysis suggests that the three groups of domain arrangements were acquired and fused independently.<ref name="Weiner06"></ref>

{{-}}

=== Other Processes that can Lead to Circular Permutations ===

====Post-translational Modification====

The two evolutionary models mentioned above describe ways in which genes may be circularly permuted, resulting in a circularly permuted [[mRNA|mRNA]] after [[transcription|transcription]]. Proteins can also be circularly permuted via [[Post-translational modification|post-translational modification]], without permuting the underlying gene. Circular permutations can happen spontaneously through [[auto-catalysis|auto-catalysis]], as in the case of concanavalin A.<ref name="Bowles1988">{{cite pmid|3070848}}</ref> Alternately, permutation may require restriction enzymes and ligases.<ref name="Goldenberg"/>

==The Role of Circular Permutations in Protein Engineering==

Many proteins have their termini located close together in 3D space.<ref>{{cite pmid|6864804}}</ref><ref name="YuLutz">{{cite pmid|21087800}}</ref> Because of this, it is often possible to design circular permutations of proteins. Today, circular permutations are generated routinely in the lab using standard genetics techniques.<ref name=Luger1989 /> Although some permutation sites prevent the protein from folding correctly, many permutants have been created with nearly identical structure and function to the original protein.

The motivation for creating a circular permutant of a protein can vary. Scientists may want to improve some property of the protein, such as
* '''Reduce [[Proteolysis|proteolytic]] susceptibility.''' The rate at which proteins are broken down can have a large impact on their activity in cells. Since termini are often accessible to [[Protease|proteases]], designing a circularly permuted protein with less accessible termini can increase the lifespan of that protein in the cell.<ref name="Whitehead09">{{cite pmid|19622546}}</ref>
* '''Improve [[Catalysis|catalytic activity]].''' Circularly permuting a protein can sometimes increase the rate at which it catalyzes a chemical reaction, leading to more efficient proteins.<ref name="Cheltsov01">{{cite pmid|11279050}}</ref>
* '''Alter substrate or [[Ligand_binding|ligand binding]].''' Circularly permuting a protein can result in the loss of substrate binding, but can occasionally lead to novel ligand binding activity or altered substrate specificity.<ref name="Qian05">{{cite pmid|16190688}}</ref>
* '''Improve [[thermostability|thermostability]].''' Making proteins active over a wider range of temperatures and conditions can improve their utility.<ref name="Topell99">{{cite pmid|10471794}}</ref>

Alternately, scientists may be interested in properties of the original protein, such as
* '''Fold order.''' Determining the order in which different parts of a protein fold is challenging due to the extremely fast time scales involved. Circularly permuted versions of proteins will often fold in a different order, providing information about the folding of the original protein.<ref name=Viguera96>{{cite pmid|8836105}}</ref><ref name=Capraro08>{{cite pmid|18806223}}</ref><ref>{{cite pmid|8819162}}</ref>
* '''Essential structural elements.''' Artificial circularly permuted proteins can allow parts of a protein to be selectively deleted. This gives insight into which structural elements are essential or not.<ref name=Huang11>{{cite pmid|21910151}}</ref>
* '''Modify [[Protein quaternary structure|quaternary structure]].''' Circularly permuted proteins have been shown to take on different quaternary structure than wild-type proteins.<ref>{{cite pmid|11344321}}</ref>
* '''Find insertion sites for other proteins.''' Inserting one protein as a domain into another protein can be useful. For instance, inserting calmodulin into [[Green_fluorescent_protein|green fluorescent protein]] (GFP) allowed researchers to measure the activity of calmodulin via the florescence of the split-GFP.<ref name="Baird1999">{{cite pmid|10500161}}</ref> Regions of GFP that tolerate the introduction of circular permutation are more likely to accept the addition of another protein while retaining the function of both proteins.
* '''Design of novel [[Biocatalyst|biocatalysts]] and biosensors.''' Introducing circular permutations can be used to design proteins to catalyze specific chemical reactions,<ref name="Turner09">{{cite pmid|19620998}}</ref><ref name="Cheltsov01"/> or to detect the presence of certain molecules using proteins. For instance, the GFP-calmodulin fusion described above can be used to detect the level of calcium ions in a sample.<ref name="Baird1999"/>

==Algorithmic Detection of Circular Permutations==

Many [[List of sequence alignment software|sequence alignment]] and [[Structural alignment software|protein structure alignment algorithms]] have been developed assuming linear data representations and as such are not able to detect circular permutations between proteins. Two examples of frequently used methods that have problems correctly aligning proteins related by circular permutation are [[Dynamic_programming|dynamic programming]] and many [[Hidden_Markov_model|hidden Markov models]]. As an alternative to these, a number of algorithms are built on top of non-linear approaches and are able to detect [[topology|topology]]-independent similarities, or employ modifications allowing them to circumvent the limitations of dynamic programming. The table below is a collection of such methods.

The algorithms are classified according to the type of input they require. ''Sequence''-based algorithms require only the sequence of two proteins in order to create an alignment. Sequence methods are generally fast and suitable for searching whole genomes for circularly permuted pairs of proteins. ''Structure''-based methods require 3D structures of both proteins being considered. They are often slower than sequence-based methods, but are able to detect circular permutations between distantly related proteins with low sequence similarity. Some structural methods are ''topology independent'', meaning that they are also able to detect more complex rearrangements than circular permutation.


{| class="sortable wikitable" border="0" align="center" style="border: 1px solid #999; background-color:#FFFFFF"
Many [[protein structure]]s are observed to have their [[N terminus|N-]] and [[C terminus|C-termini]] in close proximity in space.<ref name="Yu"/> This characteristic contributes that such permutation events can get tolerated. The amino and carboxy termini of the protein are being fused and different termini introduced, while keeping the overall arrangement of [[Protein_structure#Secondary_structure|secondary structure elements]] essentially unmodified.
|-align="left" bgcolor="#CCCCCC"
! NAME || Type
! Description || Author || Year || Availability || Reference
|-
| FBPLOT || Sequence || Draws dot plots of suboptimal sequence alignments || Zuker || 1991 || ||<ref>{{cite pmid|1920426}}</ref>
|-
| Bachar et al || Structure, topology independent || Uses geometric hashing for the topology independent comparison of proteins || Bachar et al. || 1993 || ||<ref>{{cite pmid|8506262}}</ref>
|-
| Uliel at al || Sequence||First suggestion of how a sequence comparison algorithm for the detection of circular permutations can work || Uliel et al. || 1999 || ||<ref>{{cite pmid|10743559}}</ref>
|-
| SHEBA || Structure || Duplicates a sequence in the middle; uses SHEBA algorithm for structure alignment; determines new cut position after structure alignment|| Jung & Lee || 2001 || ||<ref>{{cite pmid|11514678}}</ref>
|-
| Multiprot || Structure, Topology independent|| Calculates a sequence order independent multiple protein structure alignment || Shatsky || 2004 || [http://bioinfo3d.cs.tau.ac.il/MultiProt/ server, download] ||<ref>{{cite pmid|15162494}}</ref>
|-
| RASPODOM || Sequence || Modified [[Needleman–Wunsch algorithm|Needleman & Wunsch sequence comparison algorithm]] || Weiner et al. || 2005 || [http://iebservices.uni-muenster.de/raspodom/ server] ||<ref>{{cite pmid|15788783}}</ref>
|-
| CPSARST || Structure || Describes protein structures as one-dimensional text strings by using a Ramachandran sequential transformation (RST) algorithm. Detects circular permutations through a duplication of the sequence represention and "double filter-and-refine" strategy. || Lo, Lyu|| 2008 || [http://sarst.life.nthu.edu.tw/ server] ||<ref>{{cite pmid|18201387}}</ref>
|-
| GANGSTA +|| Structure || Works in two stages: Stage one identifies coarse alignments based on secondary structure elements. Stage two refines the alignment on residue level and extends into loop regions.|| Schmidt-Goenner et al. || 2009 || [http://agknapp.chemie.fu-berlin.de/gplus/ server], [http://agknapp.chemie.fu-berlin.de/gplus/?page=downloads download] ||<ref>{{cite pmid|20112421}}</ref>
|-
| SANA || Structure || Detect initial aligned fragment pairs (AFPs). Build network of possible AFPs. Use random-mate algorithm to connect components to a graph.|| Wang et al. || 2010 || [http://zhangroup.aporc.org/bioinfo/SANA download] ||<ref>{{cite pmid|20127263}}</ref>
|-
| CE-CP || Structure || Built on top of the [[Structural_alignment#Combinatorial_extension|combinatorial extension]] algorithm. Duplicates atoms before alignment, truncates results after alignment || Bliven et al. || 2010 || [http://www.rcsb.org/pdb/workbench/workbench.do server], [http://source.rcsb.org/ download] ||<ref>{{cite pmid|20937596}}</ref>
|}


==Further Reading==
==Role in protein engineering==


* David Goodsell (2010) [http://www.rcsb.org/pdb/101/motm.do?momID=124 ''Concanavalin A and Circular Permutation''] Research Collaboratory for Structural Biology (RCSB) Protein Data Bank (PDB) Molecule of the Month April 2010
Artificially constructed circularly permuted proteins are being used in [[protein engineering]] to stabilize proteins. They have been show to decrease the [[Proteolysis|proteolytic]] susceptibility of [[recombinant protein]]s,<ref name="whitehead">{{cite journal |author=Whitehead et al |year=2009 |title=Tying up the loose ends: circular permutation decreases the proteolytic susceptibility of recombinant proteins |journal=Prot Eng. Design |volume=22|issue=10 |pages=607-513 |pmid=19622546 }}</ref> They have been used to insert [[Protein domain|domains]] into [[green fluorescent protein]].<ref name="Baird">{{cite journal |author=Baird GS, Zacharias DA, Tsien RY. |year=1999 |title=Circular permutation and receptor insertion within green fluorescent proteins. |journal=PNAS |volume=96|issue=20 |pages=11241–11246. |pmid= 10500161 |doi=10.1073/pnas.96.20.11241 |pmc=18018}}</ref> There are several studies that use circular permutations to manipulate protein scaffolds, resulting in improved [[Catalysis|catalytic activity]] and altered substrate or [[ligand binding]] affinity. Circular permutations have been also used to enable the design of novel [[biocatalyst]]s and biosensors. For a review on this see.<ref name="Yu">{{cite journal |author=Yu and Lutz |year=2011 |title=Circular permutation: a different way to engineer enzyme structure and function. |journal=Trends Biotechnol. |volume=29|issue=1|pages=18–25 |pmid= 21087800 |doi=10.1016/j.tibtech.2010.10.004}}</ref>
* Yu and Lutz (2011), for a review of the use of circular permutation in protein design.<ref name="YuLutz"/>
* Weiner & Bornberg-Bauer (2006), for a review of evolutionary mechanisms for circular permutations.<ref name="Weiner06" />
* [[Cyclic permutation|Cyclic permutation]]


== References ==
== References ==
{{reflist|2}}


<references/>
==Further reading==
* David Goodsell (2010) ''Concanavalin A and Circular Permutation'' RCSB PDB Molecule of the Month [http://www.rcsb.org/pdb/101/motm.do?momID=124 ]


[[Category:Proteins]]
[[Category:Proteins|Category:Proteins]]
[[Category:Evolutionary biology]]
[[Category:Permutations|Category:Proteins]]
[[Category:Evolutionary processes]]

Revision as of 22:25, 29 March 2012

Schematic representation of a circular permutation in two proteins. The first protein (outer circle) has the sequence a-b-c. After the permutation the second protein (inner circle) has the sequence c-a-b. The letters N and C indicate the location of the amino- and carboxy-termini of the protein sequences and how their positions change relative to each other.

Circular permutation describes a type of relationship between proteins, whereby the proteins have a changed order of amino acids in their protein sequence, such that the sequence of the first portion of one protein (adjacent to the N-terminus) is related to that of the second portion of the other protein (near its C-terminus), and vice versa. This is directly analogous to the mathematical notion of a cyclic permutation over the set of residues in a protein.

Circular permutation can be the result of evolutionary events, post-translational modifications, or artificially engineered mutations. The result is a protein structure with different connectivity, but overall similar three-dimensional (3D) shape. The homology between portions of the proteins can be established by observing similar sequences between N- and C-terminal portions of the two proteins, structural similarity, or other methods.

History

Two proteins that are related by a circular permutation. Concanavalin A (left), from the Protein Data Bank (PDB: 3cna​), and peanut lectin (right), from PDB: 2pel​, which is homologous to favin. The termini of the proteins are highlighted by blue and green spheres, and the sequence of residues is indicated by the gradient from blue (N-terminus) to green (C-terminus). The 3D fold of the two proteins is highly similar; however, the N- and C- termini are located on different positions of the protein.[1]

In 1979, Bruce Cunningham and his colleagues discovered the first instance of a circularly permuted protein in nature.[1] After determining the peptide sequence of the lectin protein favin, they noticed its similarity to a known protein - concanavalin A - except that the ends were circularly permuted. Later work confirmed the circular permutation between the pair[2] and showed that concanavalin A is permuted post-translationally[3] through cleavage and an unusual protein ligation.[4]

After the discovery of a natural circularly permuted protein, researchers looked for a way to emulate this process. In 1983, David Goldenberg and Thomas Creighton were able to create a circularly permuted version of a protein by chemically ligating the termini to create a cyclic protein, then introducing new termini elsewhere using trypsin.[5] In 1989, Karolin Luger and her colleagues introduced a genetic method for making circular permutations by carefully fragmenting and ligating DNA.[6] This method allowed for permutations to be introduced at arbitrary sites, and is still used today to design circularly permuted proteins in the lab.

Despite the early discovery of post-translational circular permutations and the suggestion of a possible genetic mechanism for evolving circular permutants, it was not until 1995 that the first circularly permuted pair of genes were discovered. Saposins are a class of proteins involved in sphingolipid catabolism and lipid antigen presentation in humans. Christopher Ponting and Robert Russell identified a circularly permuted version of a saposin inserted into plant aspartic proteinase, which they nicknamed swaposin.[7] Saposin and swaposin were the first known case of two natural genes related by a circular permutation.

Hundreds of examples of protein pairs related by a circular permutation were subsequently discovered in nature or produced in the laboratory. The Circular Permutation Database[8] contains 2,238 circularly permuted protein pairs with known structures, and many more are known without structures.[9] The CyBase database collects proteins that are cyclic, some of which are permuted variants of cyclic wild-type proteins.[10] SISYPHUS is a database that contains a collection of hand-curated manual alignments of proteins with non-trivial relationships, several of which have circular permutations.[11]

Evolution

There are two main models that are currently being used to explain the evolution of circularly permuted proteins: permutation by duplication and fission and fusion. The two models have compelling examples supporting them, but the relative contribution of each model in evolution is still under debate.[12] Other, less common, mechanisms have been proposed, such as "cut and paste"[13] or "exon shuffling".

Permutation by Duplication

The permutation by duplication mechanism for producing a circular permutation. First, a gene is duplicated in place. Next, start and stop codons are introduced, resulting in a circularly permuted gene.

The earliest model proposed for the evolution of circular permutations is the permutation by duplication mechanism.[1] In this model, a precursor gene first undergoes a duplication and fusion to form a large tandem repeat. Next, start and stop codons are introduced at corresponding locations in the duplicated gene, removing redundant sections of the protein.

One surprising prediction of the permutation by duplication mechanism is that intermediate permutations can occur. For instance, the duplicated version of the protein should still be functional, since otherwise evolution would quickly select against such proteins. Likewise, partially duplicated intermediates where only one terminus was truncated should be functional. Such intermediates have been extensively documented in protein families such as DNA methyltransferases.[14]


Saposin and Swaposin

Suggested relationship between saposin and swaposin. They could have evolved from a similar gene.[15] Both consist of 4 alpha helices with the order of helices being permuted relative to each other.

An example for permutation by duplication is the relationship between saposin and swaposin. Saposins are highly conserved glycoproteins that consist of an approximately 80 amino acid residue long protein forming a four alpha helical structure. They have a nearly identical placement of cysteine residues and glycosylation sites. The cDNA sequence that codes for saposin is called prosaposin. It is a precursor for four cleavage products, the saposins A, B, C, and D. The four saposin domains most likely arose from two tandem duplications of an ancestral gene.[16] This repeat suggests a mechanism for the evolution of the relationship with the plant-specific insert (PSI). The PSI is a domain exclusively found in plants, consisting of approximately 100 residues and found in plant aspartic proteases.[17] It belongs to the saposin-like protein family (SAPLIP) and has the N- and C- termini "swapped", such that the order of helices is 3-4-1-2 compared with saposin, thus leading to the name "swaposin".[7] For a review on functional and structural features of saposin-like proteins, see Bruhn (2005).[18]

Fission and Fusion

The fission and fusion mechanism of circular permutation. Two separate genes arise (potentially from the fission of a single gene). If the genes fuse together in different orders in two orthologues, a circular permutation occurs.

Another model for the evolution of circular permutations is the fission and fusion model. The process starts with two partial proteins. These may represent two independent polypeptides (such as two parts of a heterodimer), or may have originally been halves of a single protein that underwent a fission event to become two polypeptides.

The two proteins can later fuse together to form a single polypeptide. Regardless of which protein comes first, this fusion protein may show similar function. Thus, if a fusion between two proteins occurs twice in evolution (either between paralogues within the same species or between orthologues in different species) but in a different order, the resulting fusion proteins will be related by a circular permutation.

Evidence for a particular protein having evolved by a fission and fusion mechanism can be provided by observing the halves of the permutation as independent polypeptides in related species, or by demonstrating experimentally that the two halves can function as separate polypeptides.[19]

Transhydrogenases

Transhydrogenases in various organisms can be found in three different domain arrangements. In cattle, the three domains are arranged sequentially. In the bacteria E. coli, Rb. capsulatus, and R. rubrum, the transhydrogenase consists of two or three subunits. Finally, transhydrogenase from the protist E. tenella consists of a single subunit that is circularly permuted relative to cattle transhydrogenase.[20]

An example for the fission and fusion mechanism can be found in nicotinamide nucleotide transhydrogenases.[20] These are membrane-bound enzymes that catalyze the transfer of a hydride ion between NAD(H) and NADP(H) in a reaction that is coupled to transmembrane proton translocation. They consist of three major functional units (I, II, and III) that can be found in different arrangement in bacteria, protozoa, and higher eukaryotes. Phylogenetic analysis suggests that the three groups of domain arrangements were acquired and fused independently.[12]

Other Processes that can Lead to Circular Permutations

Post-translational Modification

The two evolutionary models mentioned above describe ways in which genes may be circularly permuted, resulting in a circularly permuted mRNA after transcription. Proteins can also be circularly permuted via post-translational modification, without permuting the underlying gene. Circular permutations can happen spontaneously through auto-catalysis, as in the case of concanavalin A.[4] Alternately, permutation may require restriction enzymes and ligases.[5]

The Role of Circular Permutations in Protein Engineering

Many proteins have their termini located close together in 3D space.[21][22] Because of this, it is often possible to design circular permutations of proteins. Today, circular permutations are generated routinely in the lab using standard genetics techniques.[6] Although some permutation sites prevent the protein from folding correctly, many permutants have been created with nearly identical structure and function to the original protein.

The motivation for creating a circular permutant of a protein can vary. Scientists may want to improve some property of the protein, such as

  • Reduce proteolytic susceptibility. The rate at which proteins are broken down can have a large impact on their activity in cells. Since termini are often accessible to proteases, designing a circularly permuted protein with less accessible termini can increase the lifespan of that protein in the cell.[23]
  • Improve catalytic activity. Circularly permuting a protein can sometimes increase the rate at which it catalyzes a chemical reaction, leading to more efficient proteins.[24]
  • Alter substrate or ligand binding. Circularly permuting a protein can result in the loss of substrate binding, but can occasionally lead to novel ligand binding activity or altered substrate specificity.[25]
  • Improve thermostability. Making proteins active over a wider range of temperatures and conditions can improve their utility.[26]

Alternately, scientists may be interested in properties of the original protein, such as

  • Fold order. Determining the order in which different parts of a protein fold is challenging due to the extremely fast time scales involved. Circularly permuted versions of proteins will often fold in a different order, providing information about the folding of the original protein.[27][28][29]
  • Essential structural elements. Artificial circularly permuted proteins can allow parts of a protein to be selectively deleted. This gives insight into which structural elements are essential or not.[30]
  • Modify quaternary structure. Circularly permuted proteins have been shown to take on different quaternary structure than wild-type proteins.[31]
  • Find insertion sites for other proteins. Inserting one protein as a domain into another protein can be useful. For instance, inserting calmodulin into green fluorescent protein (GFP) allowed researchers to measure the activity of calmodulin via the florescence of the split-GFP.[32] Regions of GFP that tolerate the introduction of circular permutation are more likely to accept the addition of another protein while retaining the function of both proteins.
  • Design of novel biocatalysts and biosensors. Introducing circular permutations can be used to design proteins to catalyze specific chemical reactions,[33][24] or to detect the presence of certain molecules using proteins. For instance, the GFP-calmodulin fusion described above can be used to detect the level of calcium ions in a sample.[32]

Algorithmic Detection of Circular Permutations

Many sequence alignment and protein structure alignment algorithms have been developed assuming linear data representations and as such are not able to detect circular permutations between proteins. Two examples of frequently used methods that have problems correctly aligning proteins related by circular permutation are dynamic programming and many hidden Markov models. As an alternative to these, a number of algorithms are built on top of non-linear approaches and are able to detect topology-independent similarities, or employ modifications allowing them to circumvent the limitations of dynamic programming. The table below is a collection of such methods.

The algorithms are classified according to the type of input they require. Sequence-based algorithms require only the sequence of two proteins in order to create an alignment. Sequence methods are generally fast and suitable for searching whole genomes for circularly permuted pairs of proteins. Structure-based methods require 3D structures of both proteins being considered. They are often slower than sequence-based methods, but are able to detect circular permutations between distantly related proteins with low sequence similarity. Some structural methods are topology independent, meaning that they are also able to detect more complex rearrangements than circular permutation.

NAME Type Description Author Year Availability Reference
FBPLOT Sequence Draws dot plots of suboptimal sequence alignments Zuker 1991 [34]
Bachar et al Structure, topology independent Uses geometric hashing for the topology independent comparison of proteins Bachar et al. 1993 [35]
Uliel at al Sequence First suggestion of how a sequence comparison algorithm for the detection of circular permutations can work Uliel et al. 1999 [36]
SHEBA Structure Duplicates a sequence in the middle; uses SHEBA algorithm for structure alignment; determines new cut position after structure alignment Jung & Lee 2001 [37]
Multiprot Structure, Topology independent Calculates a sequence order independent multiple protein structure alignment Shatsky 2004 server, download [38]
RASPODOM Sequence Modified Needleman & Wunsch sequence comparison algorithm Weiner et al. 2005 server [39]
CPSARST Structure Describes protein structures as one-dimensional text strings by using a Ramachandran sequential transformation (RST) algorithm. Detects circular permutations through a duplication of the sequence represention and "double filter-and-refine" strategy. Lo, Lyu 2008 server [40]
GANGSTA + Structure Works in two stages: Stage one identifies coarse alignments based on secondary structure elements. Stage two refines the alignment on residue level and extends into loop regions. Schmidt-Goenner et al. 2009 server, download [41]
SANA Structure Detect initial aligned fragment pairs (AFPs). Build network of possible AFPs. Use random-mate algorithm to connect components to a graph. Wang et al. 2010 download [42]
CE-CP Structure Built on top of the combinatorial extension algorithm. Duplicates atoms before alignment, truncates results after alignment Bliven et al. 2010 server, download [43]

Further Reading

  • David Goodsell (2010) Concanavalin A and Circular Permutation Research Collaboratory for Structural Biology (RCSB) Protein Data Bank (PDB) Molecule of the Month April 2010
  • Yu and Lutz (2011), for a review of the use of circular permutation in protein design.[22]
  • Weiner & Bornberg-Bauer (2006), for a review of evolutionary mechanisms for circular permutations.[12]
  • Cyclic permutation

References

  1. ^ a b c Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 16592676, please use {{cite journal}} with |pmid=16592676 instead.
  2. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 3782132, please use {{cite journal}} with |pmid=3782132 instead.
  3. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 3965973, please use {{cite journal}} with |pmid=3965973 instead.
  4. ^ a b Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 3070848, please use {{cite journal}} with |pmid=3070848 instead.
  5. ^ a b Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 6188846, please use {{cite journal}} with |pmid=6188846 instead.
  6. ^ a b Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 2643160, please use {{cite journal}} with |pmid=2643160 instead.
  7. ^ a b Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 7610480, please use {{cite journal}} with |pmid=7610480 instead.
  8. ^ Circular Permutation Database. http://sarst.life.nthu.edu.tw/cpdb/ Accessed 16 February, 2012.
  9. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 18842637, please use {{cite journal}} with |pmid=18842637 instead.
  10. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 20564021, please use {{cite journal}} with |pmid=20564021 instead.
  11. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 17068077, please use {{cite journal}} with |pmid=17068077 instead.
  12. ^ a b c Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 16431849, please use {{cite journal}} with |pmid=16431849 instead.
  13. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 11914127, please use {{cite journal}} with |pmid=11914127 instead.
  14. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 10368444, please use {{cite journal}} with |pmid=10368444 instead.
  15. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 7610480, please use {{cite journal}} with |pmid=7610480 instead.
  16. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 11734895, please use {{cite journal}} with |pmid=11734895 instead.
  17. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 7925961, please use {{cite journal}} with |pmid=7925961 instead.
  18. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 15992358, please use {{cite journal}} with |pmid=15992358 instead.
  19. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 21173271, please use {{cite journal}} with |pmid=21173271 instead.
  20. ^ a b Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 8647343, please use {{cite journal}} with |pmid=8647343 instead.
  21. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 6864804, please use {{cite journal}} with |pmid=6864804 instead.
  22. ^ a b Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 21087800, please use {{cite journal}} with |pmid=21087800 instead.
  23. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 19622546, please use {{cite journal}} with |pmid=19622546 instead.
  24. ^ a b Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 11279050, please use {{cite journal}} with |pmid=11279050 instead.
  25. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 16190688, please use {{cite journal}} with |pmid=16190688 instead.
  26. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 10471794, please use {{cite journal}} with |pmid=10471794 instead.
  27. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 8836105, please use {{cite journal}} with |pmid=8836105 instead.
  28. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 18806223, please use {{cite journal}} with |pmid=18806223 instead.
  29. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 8819162, please use {{cite journal}} with |pmid=8819162 instead.
  30. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 21910151, please use {{cite journal}} with |pmid=21910151 instead.
  31. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 11344321, please use {{cite journal}} with |pmid=11344321 instead.
  32. ^ a b Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 10500161, please use {{cite journal}} with |pmid=10500161 instead.
  33. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 19620998, please use {{cite journal}} with |pmid=19620998 instead.
  34. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 1920426, please use {{cite journal}} with |pmid=1920426 instead.
  35. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 8506262, please use {{cite journal}} with |pmid=8506262 instead.
  36. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 10743559, please use {{cite journal}} with |pmid=10743559 instead.
  37. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 11514678, please use {{cite journal}} with |pmid=11514678 instead.
  38. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 15162494, please use {{cite journal}} with |pmid=15162494 instead.
  39. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 15788783, please use {{cite journal}} with |pmid=15788783 instead.
  40. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 18201387, please use {{cite journal}} with |pmid=18201387 instead.
  41. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 20112421, please use {{cite journal}} with |pmid=20112421 instead.
  42. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 20127263, please use {{cite journal}} with |pmid=20127263 instead.
  43. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 20937596, please use {{cite journal}} with |pmid=20937596 instead.