Jump to content

Low complexity regions in proteins: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
G d a75 (talk | contribs)
Added entire sections on Structure, Functions, Properties, Evolution
Line 1: Line 1:
''Low complexity regions in protein sequences'' (LCRs), also defined in some contexts as ''compositionally biased regions'' (CBRs), are regions in protein sequences that differ from the composition and complexity of most proteins that is normally associated with globular structure <ref>{{cite journal |vauthors=Mier P, Paladin L, Tamana S, Petrosian S, Hajdu-Soltész B, Urbanek A, Gruca A, Plewczynski D, Grynberg M, Bernadó P, Gáspári Z, Ouzounis CA, Promponas VJ, Kajava AV, Hancock JM, Tosatto SC, Dosztanyi Z, Andrade-Navarro MA |title=Disentangling the complexity of low complexity proteins|journal=Brief Bioinform |date=30 January 2019 |pmid= 30698641|doi=10.1093/bib/bbz007|doi-access=free}}</ref>. LCRs have different properties from normal regions regarding [[protein structure|structure]], function and [[evolution]].
''Low complexity regions in protein sequences'' (LCRs), also defined in some contexts as ''compositionally biased regions'' (CBRs), are regions in protein sequences that differ from the composition and complexity of most proteins that is normally associated with globular structure <ref name=":0">{{Cite journal|last=Wootton|first=John C.|date=1994-09|title=Non-globular domains in protein sequences: Automated segmentation using complexity measures|url=https://linkinghub.elsevier.com/retrieve/pii/0097848594850232|journal=Computers & Chemistry|language=en|volume=18|issue=3|pages=269–285|doi=10.1016/0097-8485(94)85023-2}}</ref><ref name=":1">{{cite journal |vauthors=Mier P, Paladin L, Tamana S, Petrosian S, Hajdu-Soltész B, Urbanek A, Gruca A, Plewczynski D, Grynberg M, Bernadó P, Gáspári Z, Ouzounis CA, Promponas VJ, Kajava AV, Hancock JM, Tosatto SC, Dosztanyi Z, Andrade-Navarro MA |title=Disentangling the complexity of low complexity proteins|journal=Brief Bioinform |date=30 January 2019 |pmid= 30698641|doi=10.1093/bib/bbz007|doi-access=free}}</ref>. LCRs have different properties from normal regions regarding [[protein structure|structure]], function and [[evolution]].

== Structure ==
LCRs were originally thought to be unstructured and flexible linkers that served to separate the structured (and functional) domains of complex proteins <ref name=":2">{{Cite journal|last=Huntley|first=Melanie A.|last2=Golding|first2=G. Brian|date=2002-07-01|title=Simple sequences are rare in the Protein Data Bank|url=http://doi.wiley.com/10.1002/prot.10150|journal=Proteins: Structure, Function, and Genetics|language=en|volume=48|issue=1|pages=134–140|doi=10.1002/prot.10150|issn=0887-3585}}</ref>, but they are also capable of forming secondary structures, like helices (more often) and even sheets <ref>{{Cite journal|last=Kumari|first=Bandana|last2=Kumar|first2=Ravindra|last3=Kumar|first3=Manish|date=2015|title=Low complexity and disordered regions of proteins have different structural and amino acid preferences|url=http://xlink.rsc.org/?DOI=C4MB00425F|journal=Molecular BioSystems|language=en|volume=11|issue=2|pages=585–594|doi=10.1039/C4MB00425F|issn=1742-206X}}</ref>. They may play a structural role in proteins such as collagens, myosin, keratins, silk, cell wall proteins <ref>{{Cite journal|last=Luo|first=H.|last2=Nijveen|first2=H.|date=2014-07-01|title=Understanding and identifying amino acid repeats|url=https://academic.oup.com/bib/article-lookup/doi/10.1093/bib/bbt003|journal=Briefings in Bioinformatics|language=en|volume=15|issue=4|pages=582–591|doi=10.1093/bib/bbt003|issn=1467-5463|pmc=PMC4103538|pmid=23418055}}</ref>. Tandem repeats of short oligopeptides that are rich in glycine, proline, serine or threonine are capable of forming flexible structures that bind ligands under certain pH and temperature conditions <ref>{{Cite web|last=Matsushima|first=Norio|last2=Yoshida|first2=Hitoshi|last3=Kumaki|first3=Yasuhiro|last4=Kamiya|first4=Masakatsu|last5=Tanaka|first5=Takanori|last6=Kretsinger|first6=Yoshinobu Izumi and Robert H.|date=2008-11-30|title=Flexible Structures and Ligand Interactions of Tandem Repeats Consisting of Proline, Glycine, Asparagine, Serine, and/or Threonine Rich Oligopeptides in Proteins|url=https://www.eurekaselect.com/83531/article|access-date=2020-11-03|website=Current Protein & Peptide Science|language=en|doi=10.2174/138920308786733886}}</ref>. Proline is a well-know alpha-helix breaker, however, amino acid repeats comprised of proline may form poly-proline helices <ref>{{Cite journal|last=Adzhubei|first=Alexei A.|last2=Sternberg|first2=Michael J.E.|last3=Makarov|first3=Alexander A.|date=2013-06|title=Polyproline-II Helix in Proteins: Structure and Function|url=https://linkinghub.elsevier.com/retrieve/pii/S0022283613001666|journal=Journal of Molecular Biology|language=en|volume=425|issue=12|pages=2100–2132|doi=10.1016/j.jmb.2013.03.018}}</ref>.

== Functions ==
LCRs were originally thought as ‘junk’ regions or as neutral linkers between domains, however, experimental and computational evidence increasingly indicates that they may play important adaptive and conserved roles, relevant to biotechnology, heterologous protein expression, medicine, as well as to our understanding of protein evolution <ref name=":3">{{Cite journal|last=Ntountoumi|first=Chrysa|last2=Vlastaridis|first2=Panayotis|last3=Mossialos|first3=Dimitris|last4=Stathopoulos|first4=Constantinos|last5=Iliopoulos|first5=Ioannis|last6=Promponas|first6=Vasilios|last7=Oliver|first7=Stephen G|last8=Amoutzias|first8=Grigoris D|date=2019-11-04|title=Low complexity regions in the proteins of prokaryotes perform important functional roles and are highly conserved|url=https://academic.oup.com/nar/article/47/19/9998/5559688|journal=Nucleic Acids Research|language=en|volume=47|issue=19|pages=9998–10009|doi=10.1093/nar/gkz730|issn=0305-1048|pmc=PMC6821194|pmid=31504783}}</ref>.

LCRs of eukaryotic proteins have been involved in human diseases <ref>{{Cite journal|last=Karlin|first=S.|last2=Brocchieri|first2=L.|last3=Bergman|first3=A.|last4=Mrazek|first4=J.|last5=Gentles|first5=A. J.|date=2002-01-08|title=Amino acid runs in eukaryotic proteomes and disease associations|url=http://www.pnas.org/cgi/doi/10.1073/pnas.012608599|journal=Proceedings of the National Academy of Sciences|language=en|volume=99|issue=1|pages=333–338|doi=10.1073/pnas.012608599|issn=0027-8424|pmc=PMC117561|pmid=11782551}}</ref><ref>{{Cite journal|last=Mirkin|first=Sergei M.|date=2007-06|title=Expandable DNA repeats and human disease|url=http://www.nature.com/articles/nature05977|journal=Nature|language=en|volume=447|issue=7147|pages=932–940|doi=10.1038/nature05977|issn=0028-0836}}</ref>, especially neurodegenerative ones, where they tend to form amyloids in humans and other eukaryotes <ref>{{Cite journal|last=Kumari|first=Bandana|last2=Kumar|first2=Ravindra|last3=Chauhan|first3=Vipin|last4=Kumar|first4=Manish|date=2018-10-30|title=Comparative functional analysis of proteins containing low-complexity predicted amyloid regions|url=https://peerj.com/articles/5823|journal=PeerJ|language=en|volume=6|pages=e5823|doi=10.7717/peerj.5823|issn=2167-8359|pmc=PMC6214233|pmid=30397544}}</ref>.

They have been reported to have adhesive roles <ref>{{Cite journal|last=So|first=Christopher R.|last2=Fears|first2=Kenan P.|last3=Leary|first3=Dagmar H.|last4=Scancella|first4=Jenifer M.|last5=Wang|first5=Zheng|last6=Liu|first6=Jinny L.|last7=Orihuela|first7=Beatriz|last8=Rittschof|first8=Dan|last9=Spillmann|first9=Christopher M.|last10=Wahl|first10=Kathryn J.|date=2016-12|title=Sequence basis of Barnacle Cement Nanostructure is Defined by Proteins with Silk Homology|url=http://www.nature.com/articles/srep36219|journal=Scientific Reports|language=en|volume=6|issue=1|pages=36219|doi=10.1038/srep36219|issn=2045-2322|pmc=PMC5099703|pmid=27824121}}</ref>, function in excreted sticky proteins used for prey capture <ref>{{Cite journal|last=Haritos|first=Victoria S.|last2=Niranjane|first2=Ajay|last3=Weisman|first3=Sarah|last4=Trueman|first4=Holly E.|last5=Sriskantha|first5=Alagacone|last6=Sutherland|first6=Tara D.|date=2010-11-07|title=Harnessing disorder: onychophorans use highly unstructured proteins, not silks, for prey capture|url=https://royalsocietypublishing.org/doi/10.1098/rspb.2010.0604|journal=Proceedings of the Royal Society B: Biological Sciences|language=en|volume=277|issue=1698|pages=3255–3263|doi=10.1098/rspb.2010.0604|issn=0962-8452|pmc=PMC2981920|pmid=20519222}}</ref>, or have roles as transducers of molecular movement, e.g. in the prokaryotic TonB/TolA systems <ref>{{Cite journal|last=Brewer|first=S.|last2=Tolley|first2=M.|last3=Trayer|first3=I.P.|last4=Barr|first4=G.C.|last5=Dorman|first5=C.J.|last6=Hannavy|first6=K.|last7=Higgins|first7=C.F.|last8=Evans|first8=J.S.|last9=Levine|first9=B.A.|last10=Wormald|first10=M.R.|date=1990-12|title=Structure and function of X-Pro dipeptide repeats in the TonB proteins of Salmonella typhimurium and Escherichia coli|url=https://linkinghub.elsevier.com/retrieve/pii/S0022283699800084|journal=Journal of Molecular Biology|language=en|volume=216|issue=4|pages=883–895|doi=10.1016/S0022-2836(99)80008-4}}</ref>.

LCRs may form surfaces for interaction with phospholipid bilayers <ref>{{Cite journal|last=Robison|first=Aaron D.|last2=Sun|first2=Simou|last3=Poyton|first3=Matthew F.|last4=Johnson|first4=Gregory A.|last5=Pellois|first5=Jean-Philippe|last6=Jungwirth|first6=Pavel|last7=Vazdar|first7=Mario|last8=Cremer|first8=Paul S.|date=2016-09-08|title=Polyarginine Interacts More Strongly and Cooperatively than Polylysine with Phospholipid Bilayers|url=https://pubs.acs.org/doi/10.1021/acs.jpcb.6b05604|journal=The Journal of Physical Chemistry B|language=en|volume=120|issue=35|pages=9287–9296|doi=10.1021/acs.jpcb.6b05604|issn=1520-6106|pmc=PMC5912336|pmid=27571288}}</ref>, or as positive charge clusters for DNA binding <ref name=":3" /><ref name=":4">{{Cite journal|last=Zhu|first=Z. Y.|last2=Karlin|first2=S.|date=1996-08-06|title=Clusters of charged residues in protein three-dimensional structures.|url=http://www.pnas.org/cgi/doi/10.1073/pnas.93.16.8350|journal=Proceedings of the National Academy of Sciences|language=en|volume=93|issue=16|pages=8350–8355|doi=10.1073/pnas.93.16.8350|issn=0027-8424|pmc=PMC38674|pmid=8710874}}</ref><ref>{{Cite journal|last=Kushwaha|first=Ambuj K.|last2=Grove|first2=Anne|date=2013-02-01|title=C-terminal low-complexity sequence repeats of Mycobacterium smegmatis Ku modulate DNA binding|url=https://portlandpress.com/bioscirep/article/doi/10.1042/BSR20120105/82147/Cterminal-lowcomplexity-sequence-repeats-of|journal=Bioscience Reports|language=en|volume=33|issue=1|pages=e00016|doi=10.1042/BSR20120105|issn=0144-8463|pmc=PMC3553676|pmid=23167261}}</ref>, or as negative or even histidine-acidic charge clusters for coordinating calcium, magnesium or zinc ions <ref name=":3" /><ref name=":4" />.

They may also play important roles in protein translation, as tRNA ‘sponges’, slowing down translation in order to allow time for the correct folding of the nascent polypeptide chain <ref>{{Cite journal|last=Frugier|first=Magali|last2=Bour|first2=Tania|last3=Ayach|first3=Maya|last4=Santos|first4=Manuel A.S.|last5=Rudinger-Thirion|first5=Joëlle|last6=Théobald-Dietrich|first6=Anne|last7=Pizzi|first7=Elizabetta|date=2010-01-21|title=Low Complexity Regions behave as tRNA sponges to help co-translational folding of plasmodial proteins|url=http://doi.wiley.com/10.1016/j.febslet.2009.11.004|journal=FEBS Letters|language=en|volume=584|issue=2|pages=448–454|doi=10.1016/j.febslet.2009.11.004}}</ref>. They may even function as frame-shift checkpoints, by shifting to an unusual amino acid content that makes the protein highly unstable or insoluble, which in turn triggers fast recycling, before any further cellular damage <ref>{{Cite journal|last=Tyedmers|first=Jens|last2=Mogk|first2=Axel|last3=Bukau|first3=Bernd|date=2010-11|title=Cellular strategies for controlling protein aggregation|url=http://www.nature.com/articles/nrm2993|journal=Nature Reviews Molecular Cell Biology|language=en|volume=11|issue=11|pages=777–788|doi=10.1038/nrm2993|issn=1471-0072}}</ref><ref>{{Cite journal|last=Ling|first=Jiqiang|last2=Cho|first2=Chris|last3=Guo|first3=Li-Tao|last4=Aerni|first4=Hans R.|last5=Rinehart|first5=Jesse|last6=Söll|first6=Dieter|date=2012-12|title=Protein Aggregation Caused by Aminoglycoside Action Is Prevented by a Hydrogen Peroxide Scavenger|url=https://linkinghub.elsevier.com/retrieve/pii/S1097276512008295|journal=Molecular Cell|language=en|volume=48|issue=5|pages=713–722|doi=10.1016/j.molcel.2012.10.001|pmc=PMC3525788|pmid=23122414}}</ref>.

Analyses on model and non-model eukaryotic proteomes have revealed that LCRs are frequently found in proteins involved in binding of nucleic acids (DNA or RNA), in transcription, receptor activity, development, reproduction and immunity whereas metabolic proteins are depleted of LCRs <ref name=":2" /><ref name=":5">{{Cite journal|last=Haerty|first=Wilfried|last2=Golding|first2=G. Brian|date=2010-10|editor-last=Bonen|editor-first=Linda|title=Low-complexity sequences and single amino acid repeats: not just “junk” peptide sequences|url=http://www.nrcresearchpress.com/doi/10.1139/G10-063|journal=Genome|language=en|volume=53|issue=10|pages=753–762|doi=10.1139/G10-063|issn=0831-2796}}</ref><ref>{{Cite journal|last=Faux|first=N. G.|date=2005-03-21|title=Functional insights from the distribution and role of homopeptide repeat-containing proteins|url=http://www.genome.org/cgi/doi/10.1101/gr.3096505|journal=Genome Research|language=en|volume=15|issue=4|pages=537–551|doi=10.1101/gr.3096505|issn=1088-9051|pmc=PMC1074368|pmid=15805494}}</ref><ref>{{Citation|last=Albà|first=M.M.|title=Amino Acid Repeats and the Structure and Evolution of Proteins|date=2007|url=https://www.karger.com/Article/FullText/107607|work=Genome Dynamics|pages=119–130|editor-last=Volff|editor-first=J.-N.|place=Basel|publisher=KARGER|language=en|doi=10.1159/000107607|isbn=978-3-8055-8340-4|access-date=2020-11-03|last2=Tompa|first2=P.|last3=Veitia|first3=R.A.}}</ref>. A bioinformatics study of the Uniprot annotation of LCR containing proteins observed that 44% (9751/22259) of Bacterial and 44% (662/1521) of Archaeal LCRs are detected in proteins of unknown function, however, a significant number of proteins of known function (from many different species), especially those involved in translation and the ribosome, nucleic acid binding, metal-ion binding, and protein folding were also found to contain LCRs <ref name=":3" />.

== Properties ==
LCRs are more abundant in eukaryotes, but they also have a significant presence in many prokaryotes <ref name=":3" />. On average, 0.05 and 0.07% of the bacterial and archaeal proteomes (total amino acids of LCRs in a given proteome/total amino acids of that proteome) form LCRs whereas for five model eukaryotic proteomes (human, fruitfly, yeast, fission yeast, ''Arabidopsis'') this coverage was significantly higher (on average, 0.4%; between 2 and 23 times higher than prokaryotes) <ref name=":3" />.

Eukaryotic LCRs tend to be longer than prokaryotic LCRs <ref name=":3" />.The average size of a eukaryotic LCR is 42 amino acids long, whereas bacterial, archaeal and phage LCRs are 38, 36 and 33 amino acids long, respectively <ref name=":3" />.

In the Archaea, the halobacterium ''Natrialba magadii'' has the highest number of LCRs and the highest enrichment for LCRs <ref name=":3" />. In Bacteria, ''Enhygromyxa salina'', a delta proteobacterium that belongs to myxobacteria has the highest number of LCRs and the highest enrichment for LCRs <ref name=":3" />. Intriguingly, four of the top five bacteria with the highest enrichment for LCRs are also myxobacteria <ref name=":3" />.

The three most enriched amino acids within LCRs of Bacteria are proline, glycine and alanine, whereas in Archaea they are threonine, aspartate and proline <ref name=":3" />. In Phages, they are alanine, glycine and proline <ref name=":3" />. Glycine and proline emerge as very enriched amino acids in all three evolutionary lineages, whereas alanine is highly enriched in Bacteria and Phages but not enriched in Archaea. On the other hand, hydrophobic (M, I, L, V) and aromatic amino acids (F, Y, W) as well as cysteine, arginine and asparagine are heavily under-represented in LCRs <ref name=":3" />. Very similar trends for amino acids with a high (G, A, P, S, Q) and low (M, V, L, I, W, F, R, C) occurrence within LCRs have been observed in eukaryotes as well <ref>{{Cite journal|last=Marcotte|first=Edward M.|last2=Pellegrini|first2=Matteo|last3=Yeates|first3=Todd O.|last4=Eisenberg|first4=David|date=1999-10|title=A census of protein repeats|url=https://linkinghub.elsevier.com/retrieve/pii/S0022283699931364|journal=Journal of Molecular Biology|language=en|volume=293|issue=1|pages=151–160|doi=10.1006/jmbi.1999.3136}}</ref><ref name=":5" />. This observed pattern of certain amino acids being over-represented (enriched for) or under-represented in LCRs could be partially explained by the energy cost for synthesis or metabolism of each of the amino acids <ref name=":3" />. Another possible explanation, which does not exclude the previous explanation of energy cost could be the reactivity of certain amino acids <ref name=":3" />. For example, Cysteine is a very reactive amino acid that would not be tolerated in high numbers within a small region of a protein <ref>{{Cite journal|last=Marino|first=Stefano M.|last2=Gladyshev|first2=Vadim N.|date=2012-02-10|title=Analysis and Functional Prediction of Reactive Cysteine Residues|url=http://www.jbc.org/lookup/doi/10.1074/jbc.R111.275578|journal=Journal of Biological Chemistry|language=en|volume=287|issue=7|pages=4419–4425|doi=10.1074/jbc.R111.275578|issn=0021-9258|pmc=PMC3281665|pmid=22157013}}</ref>. Similarly, extremely hydrophobic regions can form non-specific protein–protein interactions among themselves and with other moderately hydrophobic regions <ref>{{Cite journal|last=Dorsman|first=J. C.|date=2002-06-15|title=Strong aggregation and increased toxicity of polyleucine over polyglutamine stretches in mammalian cells|url=https://academic.oup.com/hmg/article-lookup/doi/10.1093/hmg/11.13.1487|journal=Human Molecular Genetics|volume=11|issue=13|pages=1487–1496|doi=10.1093/hmg/11.13.1487}}</ref><ref>{{Cite journal|last=Oma|first=Yoko|last2=Kino|first2=Yoshihiro|last3=Sasagawa|first3=Noboru|last4=Ishiura|first4=Shoichi|date=2004-05-14|title=Intracellular Localization of Homopolymeric Amino Acid-containing Proteins Expressed in Mammalian Cells|url=http://www.jbc.org/lookup/doi/10.1074/jbc.M309887200|journal=Journal of Biological Chemistry|language=en|volume=279|issue=20|pages=21217–21222|doi=10.1074/jbc.M309887200|issn=0021-9258}}</ref> in mammalian cells. Thus, their presence may disturb the balance of protein-protein interaction networks within the cell, especially if the carrier proteins are highly expressed <ref name=":3" />. A third explanation may be based on micro-evolutionary forces and, more specifically, on the bias of DNA polymerase slippage for certain di- tri- or tetra-nucleotides .<ref name=":3" />

=== Amino acid enrichment for certain functional categories of LCRs ===
A bioinformatics analysis of prokaryotic LCRs identified 5 types of amino acid enrichment, for certain functional categories of LCRs <ref name=":3" />:

* Proteins with GO terms related to polysaccharide binding and processing were enriched for serine and threonine in their LCRs.
* Proteins with GO terms related to RNA binding and processing were enriched for arginine in their LCRs.
* Proteins with GO terms related to DNA binding and processing were especially enriched for lysine, but also for glycine, tyrosine, phenylalanine and glutamine in their LCRs.
* Proteins with GO terms related to metal binding and more specifically to cobalt or nickel-binding were enriched mostly for histidine but also for aspartate in their LCRs.
* Proteins with GO terms related to protein folding were enriched for glycine, methionine and phenylalanine in their LCRs.

Based on the above observations and analyses, a Neural Network webserver named LCR-hound has been developed to predict LCRs and their function <ref name=":3" />.

== Evolution ==
LCRs are very interesting from a micro and macro evolutionary perspective <ref name=":3" />. They may be generated by DNA slippage, recombination and repair <ref>{{Cite journal|last=Ellegren|first=Hans|date=2004-06|title=Microsatellites: simple sequences with complex evolution|url=http://www.nature.com/articles/nrg1348|journal=Nature Reviews Genetics|language=en|volume=5|issue=6|pages=435–445|doi=10.1038/nrg1348|issn=1471-0056}}</ref>. Thus, they are linked to recombination hotspots and may even possibly facilitate cross-over <ref>{{Cite journal|last=Verstrepen|first=Kevin J|last2=Jansen|first2=An|last3=Lewitter|first3=Fran|last4=Fink|first4=Gerald R|date=2005-09|title=Intragenic tandem repeats generate functional variability|url=http://www.nature.com/articles/ng1618|journal=Nature Genetics|language=en|volume=37|issue=9|pages=986–990|doi=10.1038/ng1618|issn=1061-4036|pmc=PMC1462868|pmid=16086015}}</ref><ref>{{Cite journal|last=Siwach|first=Pratibha|last2=Pophaly|first2=Saurabh Dilip|last3=Ganesh|first3=Subramaniam|date=2006-07-01|title=Genomic and Evolutionary Insights into Genes Encoding Proteins with Single Amino Acid Repeats|url=http://academic.oup.com/mbe/article/23/7/1357/1065078/Genomic-and-Evolutionary-Insights-into-Genes|journal=Molecular Biology and Evolution|language=en|volume=23|issue=7|pages=1357–1369|doi=10.1093/molbev/msk022|issn=1537-1719}}</ref>. By originating from genetic instability, they may cause, at the DNA level, a certain region of the protein to expand or contract and even cause frame-shifts (phase-variants) that affect microbial pathogenicity or provide raw material for evolution <ref>{{Cite journal|last=Moxon|first=Richard|last2=Bayliss|first2=Chris|last3=Hood|first3=Derek|date=2006-12|title=Bacterial Contingency Loci: The Role of Simple Sequence DNA Repeats in Bacterial Adaptation|url=http://www.annualreviews.org/doi/10.1146/annurev.genet.40.110405.090442|journal=Annual Review of Genetics|language=en|volume=40|issue=1|pages=307–333|doi=10.1146/annurev.genet.40.110405.090442|issn=0066-4197}}</ref>. Most intriguingly, they may provide a window into the very early evolution of life <ref name=":3" /><ref>{{Cite journal|last=Toll-Riera|first=M.|last2=Rado-Trilla|first2=N.|last3=Martys|first3=F.|last4=Alba|first4=M. M.|date=2012-03-01|title=Role of Low-Complexity Sequences in the Formation of Novel Protein Coding Sequences|url=https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msr263|journal=Molecular Biology and Evolution|language=en|volume=29|issue=3|pages=883–886|doi=10.1093/molbev/msr263|issn=0737-4038}}</ref>. During early evolution, when only few amino acids were available and the primary genetic code was still expanding its repertoire, the first proteins were assumed to be short, repetitive and therefore, of low complexity <ref>{{Cite journal|last=Ohno|first=S.|last2=Epplen|first2=J. T.|date=1983-06-01|title=The primitive code and repeats of base oligomers as the primordial protein-encoding sequence.|url=http://www.pnas.org/cgi/doi/10.1073/pnas.80.11.3391|journal=Proceedings of the National Academy of Sciences|language=en|volume=80|issue=11|pages=3391–3395|doi=10.1073/pnas.80.11.3391|issn=0027-8424|pmc=PMC394049|pmid=6574491}}</ref><ref name=":6">{{Cite journal|last=Trifonov|first=Edward N.|date=2009-09|title=The origin of the genetic code and of the earliest oligopeptides|url=https://linkinghub.elsevier.com/retrieve/pii/S0923250809000576|journal=Research in Microbiology|language=en|volume=160|issue=7|pages=481–486|doi=10.1016/j.resmic.2009.05.004}}</ref>. Thus, modern LCRs could represent primordial aspects of the evolution towards the protein world and may provide clues about the functions of the early proto-peptides <ref name=":3" />.

Most studies have focused on the evolution, functional and structural role of eukaryotic LCRs <ref name=":3" />. However, a comprehensive study of prokaryotic LCRs from many diverse prokaryotic lineages provides a unique opportunity to understanding the origin, evolution and nature of these regions. Due to the high effective population size and short generation times of prokaryotes, the ''de novo'' emergence of a mildly or moderately deleterious amino acid repeat or LCR should quickly be filtered out by strong selective forces <ref name=":3" />. This must be especially the case for LCRs found in highly expressed proteins, since they should also have a great impact on the energy burden of protein translation <ref>{{Cite journal|last=Akashi|first=Hiroshi|last2=Gojobori|first2=Takashi|date=2002-03-19|title=Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis|url=http://www.pnas.org/lookup/doi/10.1073/pnas.062526999|journal=Proceedings of the National Academy of Sciences|language=en|volume=99|issue=6|pages=3695–3700|doi=10.1073/pnas.062526999|issn=0027-8424|pmc=PMC122586|pmid=11904428}}</ref><ref>{{Cite journal|last=Barton|first=Michael D.|last2=Delneri|first2=Daniela|last3=Oliver|first3=Stephen G.|last4=Rattray|first4=Magnus|last5=Bergman|first5=Casey M.|date=2010-08-17|editor-last=Bähler|editor-first=Jürg|title=Evolutionary Systems Biology of Amino Acid Biosynthetic Cost in Yeast|url=https://dx.plos.org/10.1371/journal.pone.0011935|journal=PLoS ONE|language=en|volume=5|issue=8|pages=e11935|doi=10.1371/journal.pone.0011935|issn=1932-6203|pmc=PMC2923148|pmid=20808905}}</ref>. Thus, any prokaryotic LCRs that constitute evolutionary accidents with no functional significance should not be fixed by genetic drift and consequently should not demonstrate any levels of conservation among moderately distant evolutionary relatives<ref name=":3" />. On the contrary, any LCR found among homologs of several moderately distant prokaryotic species should very probably reserve a functional role <ref name=":3" />.

== LCRs and the protopeptides of the early genetic code ==
The amino acids with the highest frequency in LCRs are glycine and alanine, with their respective codons GGC and GCC being the most frequent, as well as complementary <ref name=":3" />. In eukaryotes and more specifically in chordates (such as human, mouse, chicken, zebrafish and sea squirt), alanine- and glycine-rich LCRs are over-represented in recently formed LCRs and probably are better tolerated by the cell <ref>{{Cite journal|last=Radó-Trilla|first=Núria|last2=Albà|first2=MMar|date=2012|title=Dissecting the role of low-complexity regions in the evolution of vertebrate proteins|url=http://bmcevolbiol.biomedcentral.com/articles/10.1186/1471-2148-12-155|journal=BMC Evolutionary Biology|language=en|volume=12|issue=1|pages=155|doi=10.1186/1471-2148-12-155|issn=1471-2148|pmc=PMC3523016|pmid=22920595}}</ref>. Intriguingly, it has also been suggested that they represent the very first two amino acids <ref name=":7">{{Cite journal|last=Higgs|first=Paul G.|last2=Pudritz|first2=Ralph E.|date=2009-06|title=A Thermodynamic Basis for Prebiotic Amino Acid Synthesis and the Nature of the First Genetic Code|url=http://www.liebertpub.com/doi/10.1089/ast.2008.0280|journal=Astrobiology|language=en|volume=9|issue=5|pages=483–490|doi=10.1089/ast.2008.0280|issn=1531-1074}}</ref> and codons <ref name=":6" /><ref>{{Cite journal|last=Trifonov|first=E.N|date=2000-12|title=Consensus temporal order of amino acids and evolution of the triplet code|url=https://linkinghub.elsevier.com/retrieve/pii/S0378111900004765|journal=Gene|language=en|volume=261|issue=1|pages=139–151|doi=10.1016/S0378-1119(00)00476-5}}</ref><ref>{{Cite journal|last=Trifonov|first=Edward N.|date=2004-08-01|title=The Triplet Code From First Principles|url=https://doi.org/10.1080/07391102.2004.10506975|journal=Journal of Biomolecular Structure and Dynamics|volume=22|issue=1|pages=1–11|doi=10.1080/07391102.2004.10506975|issn=0739-1102|pmid=15214800}}</ref> of the early genetic code. Thus, these two codons and their respective amino acids must have been constituents of the earliest oligopeptides, with a length of 10–55 amino acids <ref>{{Cite journal|last=Ferris|first=James P.|last2=Hill|first2=Aubrey R.|last3=Liu|first3=Rihe|last4=Orgel|first4=Leslie E.|date=1996-05|title=Synthesis of long prebiotic oligomers on mineral surfaces|url=http://www.nature.com/articles/381059a0|journal=Nature|language=en|volume=381|issue=6577|pages=59–61|doi=10.1038/381059a0|issn=0028-0836}}</ref> and very low complexity. Based on several different criteria and sources of data, Higgs and Pudritz <ref name=":7" /> suggest G, A, D, E, V, S, P, I, L, T as the early amino acids of the genetic code. Trifonov's work largely agrees with this categorization and proposes that the early amino acids in chronological order are G, A, D, V, S, P, E, L, T, R. An evolutionary analysis observed that many of the amino acids of the suggested very early genetic code (with the exception of the hydrophobic ones) are significantly enriched in bacterial LCRs<ref name=":3" />. Most of the later additions to the genetic code are significantly under-represented in bacterial LCRs <ref name=":3" />. They thus hypothesize and propose that, in a cell-free environment, the early genetic code may have also produced low complexity oligo-peptides from valine and leucine <ref name=":3" />. However, later on, within a more complex cellular environment, these highly hydrophobic LCRs became inappropriate or even toxic from a protein interaction perspective and have been selected against ever since <ref name=":3" />. In addition, they further hypothesize that the very early protopeptides did not have a nucleic acid binding role <ref name=":3" />, because DNA and RNA-binding LCRs are highly enriched in glucine, arginine and lysine, however, arginine and lysine are not among the amino acids of the proposed early genetic code.


== Detection methods ==
== Detection methods ==
{{Main|List of software to detect low complexity regions in proteins}}
{{Main|List of software to detect low complexity regions in proteins}}


Low complexity regions in proteins can be computationally detected from sequence using various methods and definitions.
Low complexity regions in proteins can be computationally detected from sequence using various methods and definitions, as reviewed in <ref name=":1" />. Among the most popular methodologies to identify LCRs is by measuring their Shannon entropy <ref name=":0" />. The lower the value of the calculated entropy, the more homogeneous the region is in terms of amino acid content. In addition, a Neural Network webserver, LCR-hound has been developed to predict the function of an LCR, based on its amino acid or di-amino acid content <ref name=":3" />.


== References ==
== References ==

Revision as of 14:13, 3 November 2020

Low complexity regions in protein sequences (LCRs), also defined in some contexts as compositionally biased regions (CBRs), are regions in protein sequences that differ from the composition and complexity of most proteins that is normally associated with globular structure [1][2]. LCRs have different properties from normal regions regarding structure, function and evolution.

Structure

LCRs were originally thought to be unstructured and flexible linkers that served to separate the structured (and functional) domains of complex proteins [3], but they are also capable of forming secondary structures, like helices (more often) and even sheets [4]. They may play a structural role in proteins such as collagens, myosin, keratins, silk, cell wall proteins [5]. Tandem repeats of short oligopeptides that are rich in glycine, proline, serine or threonine are capable of forming flexible structures that bind ligands under certain pH and temperature conditions [6]. Proline is a well-know alpha-helix breaker, however, amino acid repeats comprised of proline may form poly-proline helices [7].

Functions

LCRs were originally thought as ‘junk’ regions or as neutral linkers between domains, however, experimental and computational evidence increasingly indicates that they may play important adaptive and conserved roles, relevant to biotechnology, heterologous protein expression, medicine, as well as to our understanding of protein evolution [8].

LCRs of eukaryotic proteins have been involved in human diseases [9][10], especially neurodegenerative ones, where they tend to form amyloids in humans and other eukaryotes [11].

They have been reported to have adhesive roles [12], function in excreted sticky proteins used for prey capture [13], or have roles as transducers of molecular movement, e.g. in the prokaryotic TonB/TolA systems [14].

LCRs may form surfaces for interaction with phospholipid bilayers [15], or as positive charge clusters for DNA binding [8][16][17], or as negative or even histidine-acidic charge clusters for coordinating calcium, magnesium or zinc ions [8][16].

They may also play important roles in protein translation, as tRNA ‘sponges’, slowing down translation in order to allow time for the correct folding of the nascent polypeptide chain [18]. They may even function as frame-shift checkpoints, by shifting to an unusual amino acid content that makes the protein highly unstable or insoluble, which in turn triggers fast recycling, before any further cellular damage [19][20].

Analyses on model and non-model eukaryotic proteomes have revealed that LCRs are frequently found in proteins involved in binding of nucleic acids (DNA or RNA), in transcription, receptor activity, development, reproduction and immunity whereas metabolic proteins are depleted of LCRs [3][21][22][23]. A bioinformatics study of the Uniprot annotation of LCR containing proteins observed that 44% (9751/22259) of Bacterial and 44% (662/1521) of Archaeal LCRs are detected in proteins of unknown function, however, a significant number of proteins of known function (from many different species), especially those involved in translation and the ribosome, nucleic acid binding, metal-ion binding, and protein folding were also found to contain LCRs [8].

Properties

LCRs are more abundant in eukaryotes, but they also have a significant presence in many prokaryotes [8]. On average, 0.05 and 0.07% of the bacterial and archaeal proteomes (total amino acids of LCRs in a given proteome/total amino acids of that proteome) form LCRs whereas for five model eukaryotic proteomes (human, fruitfly, yeast, fission yeast, Arabidopsis) this coverage was significantly higher (on average, 0.4%; between 2 and 23 times higher than prokaryotes) [8].

Eukaryotic LCRs tend to be longer than prokaryotic LCRs [8].The average size of a eukaryotic LCR is 42 amino acids long, whereas bacterial, archaeal and phage LCRs are 38, 36 and 33 amino acids long, respectively [8].

In the Archaea, the halobacterium Natrialba magadii has the highest number of LCRs and the highest enrichment for LCRs [8]. In Bacteria, Enhygromyxa salina, a delta proteobacterium that belongs to myxobacteria has the highest number of LCRs and the highest enrichment for LCRs [8]. Intriguingly, four of the top five bacteria with the highest enrichment for LCRs are also myxobacteria [8].

The three most enriched amino acids within LCRs of Bacteria are proline, glycine and alanine, whereas in Archaea they are threonine, aspartate and proline [8]. In Phages, they are alanine, glycine and proline [8]. Glycine and proline emerge as very enriched amino acids in all three evolutionary lineages, whereas alanine is highly enriched in Bacteria and Phages but not enriched in Archaea. On the other hand, hydrophobic (M, I, L, V) and aromatic amino acids (F, Y, W) as well as cysteine, arginine and asparagine are heavily under-represented in LCRs [8]. Very similar trends for amino acids with a high (G, A, P, S, Q) and low (M, V, L, I, W, F, R, C) occurrence within LCRs have been observed in eukaryotes as well [24][21]. This observed pattern of certain amino acids being over-represented (enriched for) or under-represented in LCRs could be partially explained by the energy cost for synthesis or metabolism of each of the amino acids [8]. Another possible explanation, which does not exclude the previous explanation of energy cost could be the reactivity of certain amino acids [8]. For example, Cysteine is a very reactive amino acid that would not be tolerated in high numbers within a small region of a protein [25]. Similarly, extremely hydrophobic regions can form non-specific protein–protein interactions among themselves and with other moderately hydrophobic regions [26][27] in mammalian cells. Thus, their presence may disturb the balance of protein-protein interaction networks within the cell, especially if the carrier proteins are highly expressed [8]. A third explanation may be based on micro-evolutionary forces and, more specifically, on the bias of DNA polymerase slippage for certain di- tri- or tetra-nucleotides .[8]

Amino acid enrichment for certain functional categories of LCRs

A bioinformatics analysis of prokaryotic LCRs identified 5 types of amino acid enrichment, for certain functional categories of LCRs [8]:

  • Proteins with GO terms related to polysaccharide binding and processing were enriched for serine and threonine in their LCRs.
  • Proteins with GO terms related to RNA binding and processing were enriched for arginine in their LCRs.
  • Proteins with GO terms related to DNA binding and processing were especially enriched for lysine, but also for glycine, tyrosine, phenylalanine and glutamine in their LCRs.
  • Proteins with GO terms related to metal binding and more specifically to cobalt or nickel-binding were enriched mostly for histidine but also for aspartate in their LCRs.
  • Proteins with GO terms related to protein folding were enriched for glycine, methionine and phenylalanine in their LCRs.

Based on the above observations and analyses, a Neural Network webserver named LCR-hound has been developed to predict LCRs and their function [8].

Evolution

LCRs are very interesting from a micro and macro evolutionary perspective [8]. They may be generated by DNA slippage, recombination and repair [28]. Thus, they are linked to recombination hotspots and may even possibly facilitate cross-over [29][30]. By originating from genetic instability, they may cause, at the DNA level, a certain region of the protein to expand or contract and even cause frame-shifts (phase-variants) that affect microbial pathogenicity or provide raw material for evolution [31]. Most intriguingly, they may provide a window into the very early evolution of life [8][32]. During early evolution, when only few amino acids were available and the primary genetic code was still expanding its repertoire, the first proteins were assumed to be short, repetitive and therefore, of low complexity [33][34]. Thus, modern LCRs could represent primordial aspects of the evolution towards the protein world and may provide clues about the functions of the early proto-peptides [8].

Most studies have focused on the evolution, functional and structural role of eukaryotic LCRs [8]. However, a comprehensive study of prokaryotic LCRs from many diverse prokaryotic lineages provides a unique opportunity to understanding the origin, evolution and nature of these regions. Due to the high effective population size and short generation times of prokaryotes, the de novo emergence of a mildly or moderately deleterious amino acid repeat or LCR should quickly be filtered out by strong selective forces [8]. This must be especially the case for LCRs found in highly expressed proteins, since they should also have a great impact on the energy burden of protein translation [35][36]. Thus, any prokaryotic LCRs that constitute evolutionary accidents with no functional significance should not be fixed by genetic drift and consequently should not demonstrate any levels of conservation among moderately distant evolutionary relatives[8]. On the contrary, any LCR found among homologs of several moderately distant prokaryotic species should very probably reserve a functional role [8].

LCRs and the protopeptides of the early genetic code

The amino acids with the highest frequency in LCRs are glycine and alanine, with their respective codons GGC and GCC being the most frequent, as well as complementary [8]. In eukaryotes and more specifically in chordates (such as human, mouse, chicken, zebrafish and sea squirt), alanine- and glycine-rich LCRs are over-represented in recently formed LCRs and probably are better tolerated by the cell [37]. Intriguingly, it has also been suggested that they represent the very first two amino acids [38] and codons [34][39][40] of the early genetic code. Thus, these two codons and their respective amino acids must have been constituents of the earliest oligopeptides, with a length of 10–55 amino acids [41] and very low complexity. Based on several different criteria and sources of data, Higgs and Pudritz [38] suggest G, A, D, E, V, S, P, I, L, T as the early amino acids of the genetic code. Trifonov's work largely agrees with this categorization and proposes that the early amino acids in chronological order are G, A, D, V, S, P, E, L, T, R. An evolutionary analysis observed that many of the amino acids of the suggested very early genetic code (with the exception of the hydrophobic ones) are significantly enriched in bacterial LCRs[8]. Most of the later additions to the genetic code are significantly under-represented in bacterial LCRs [8]. They thus hypothesize and propose that, in a cell-free environment, the early genetic code may have also produced low complexity oligo-peptides from valine and leucine [8]. However, later on, within a more complex cellular environment, these highly hydrophobic LCRs became inappropriate or even toxic from a protein interaction perspective and have been selected against ever since [8]. In addition, they further hypothesize that the very early protopeptides did not have a nucleic acid binding role [8], because DNA and RNA-binding LCRs are highly enriched in glucine, arginine and lysine, however, arginine and lysine are not among the amino acids of the proposed early genetic code.

Detection methods

Low complexity regions in proteins can be computationally detected from sequence using various methods and definitions, as reviewed in [2]. Among the most popular methodologies to identify LCRs is by measuring their Shannon entropy [1]. The lower the value of the calculated entropy, the more homogeneous the region is in terms of amino acid content. In addition, a Neural Network webserver, LCR-hound has been developed to predict the function of an LCR, based on its amino acid or di-amino acid content [8].

References

  1. ^ a b Wootton, John C. (1994-09). "Non-globular domains in protein sequences: Automated segmentation using complexity measures". Computers & Chemistry. 18 (3): 269–285. doi:10.1016/0097-8485(94)85023-2. {{cite journal}}: Check date values in: |date= (help)
  2. ^ a b Mier P, Paladin L, Tamana S, Petrosian S, Hajdu-Soltész B, Urbanek A, Gruca A, Plewczynski D, Grynberg M, Bernadó P, Gáspári Z, Ouzounis CA, Promponas VJ, Kajava AV, Hancock JM, Tosatto SC, Dosztanyi Z, Andrade-Navarro MA (30 January 2019). "Disentangling the complexity of low complexity proteins". Brief Bioinform. doi:10.1093/bib/bbz007. PMID 30698641.
  3. ^ a b Huntley, Melanie A.; Golding, G. Brian (2002-07-01). "Simple sequences are rare in the Protein Data Bank". Proteins: Structure, Function, and Genetics. 48 (1): 134–140. doi:10.1002/prot.10150. ISSN 0887-3585.
  4. ^ Kumari, Bandana; Kumar, Ravindra; Kumar, Manish (2015). "Low complexity and disordered regions of proteins have different structural and amino acid preferences". Molecular BioSystems. 11 (2): 585–594. doi:10.1039/C4MB00425F. ISSN 1742-206X.
  5. ^ Luo, H.; Nijveen, H. (2014-07-01). "Understanding and identifying amino acid repeats". Briefings in Bioinformatics. 15 (4): 582–591. doi:10.1093/bib/bbt003. ISSN 1467-5463. PMC 4103538. PMID 23418055.{{cite journal}}: CS1 maint: PMC format (link)
  6. ^ Matsushima, Norio; Yoshida, Hitoshi; Kumaki, Yasuhiro; Kamiya, Masakatsu; Tanaka, Takanori; Kretsinger, Yoshinobu Izumi and Robert H. (2008-11-30). "Flexible Structures and Ligand Interactions of Tandem Repeats Consisting of Proline, Glycine, Asparagine, Serine, and/or Threonine Rich Oligopeptides in Proteins". Current Protein & Peptide Science. doi:10.2174/138920308786733886. Retrieved 2020-11-03.
  7. ^ Adzhubei, Alexei A.; Sternberg, Michael J.E.; Makarov, Alexander A. (2013-06). "Polyproline-II Helix in Proteins: Structure and Function". Journal of Molecular Biology. 425 (12): 2100–2132. doi:10.1016/j.jmb.2013.03.018. {{cite journal}}: Check date values in: |date= (help)
  8. ^ a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae af ag ah Ntountoumi, Chrysa; Vlastaridis, Panayotis; Mossialos, Dimitris; Stathopoulos, Constantinos; Iliopoulos, Ioannis; Promponas, Vasilios; Oliver, Stephen G; Amoutzias, Grigoris D (2019-11-04). "Low complexity regions in the proteins of prokaryotes perform important functional roles and are highly conserved". Nucleic Acids Research. 47 (19): 9998–10009. doi:10.1093/nar/gkz730. ISSN 0305-1048. PMC 6821194. PMID 31504783.{{cite journal}}: CS1 maint: PMC format (link)
  9. ^ Karlin, S.; Brocchieri, L.; Bergman, A.; Mrazek, J.; Gentles, A. J. (2002-01-08). "Amino acid runs in eukaryotic proteomes and disease associations". Proceedings of the National Academy of Sciences. 99 (1): 333–338. doi:10.1073/pnas.012608599. ISSN 0027-8424. PMC 117561. PMID 11782551.{{cite journal}}: CS1 maint: PMC format (link)
  10. ^ Mirkin, Sergei M. (2007-06). "Expandable DNA repeats and human disease". Nature. 447 (7147): 932–940. doi:10.1038/nature05977. ISSN 0028-0836. {{cite journal}}: Check date values in: |date= (help)
  11. ^ Kumari, Bandana; Kumar, Ravindra; Chauhan, Vipin; Kumar, Manish (2018-10-30). "Comparative functional analysis of proteins containing low-complexity predicted amyloid regions". PeerJ. 6: e5823. doi:10.7717/peerj.5823. ISSN 2167-8359. PMC 6214233. PMID 30397544.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  12. ^ So, Christopher R.; Fears, Kenan P.; Leary, Dagmar H.; Scancella, Jenifer M.; Wang, Zheng; Liu, Jinny L.; Orihuela, Beatriz; Rittschof, Dan; Spillmann, Christopher M.; Wahl, Kathryn J. (2016-12). "Sequence basis of Barnacle Cement Nanostructure is Defined by Proteins with Silk Homology". Scientific Reports. 6 (1): 36219. doi:10.1038/srep36219. ISSN 2045-2322. PMC 5099703. PMID 27824121. {{cite journal}}: Check date values in: |date= (help)CS1 maint: PMC format (link)
  13. ^ Haritos, Victoria S.; Niranjane, Ajay; Weisman, Sarah; Trueman, Holly E.; Sriskantha, Alagacone; Sutherland, Tara D. (2010-11-07). "Harnessing disorder: onychophorans use highly unstructured proteins, not silks, for prey capture". Proceedings of the Royal Society B: Biological Sciences. 277 (1698): 3255–3263. doi:10.1098/rspb.2010.0604. ISSN 0962-8452. PMC 2981920. PMID 20519222.{{cite journal}}: CS1 maint: PMC format (link)
  14. ^ Brewer, S.; Tolley, M.; Trayer, I.P.; Barr, G.C.; Dorman, C.J.; Hannavy, K.; Higgins, C.F.; Evans, J.S.; Levine, B.A.; Wormald, M.R. (1990-12). "Structure and function of X-Pro dipeptide repeats in the TonB proteins of Salmonella typhimurium and Escherichia coli". Journal of Molecular Biology. 216 (4): 883–895. doi:10.1016/S0022-2836(99)80008-4. {{cite journal}}: Check date values in: |date= (help)
  15. ^ Robison, Aaron D.; Sun, Simou; Poyton, Matthew F.; Johnson, Gregory A.; Pellois, Jean-Philippe; Jungwirth, Pavel; Vazdar, Mario; Cremer, Paul S. (2016-09-08). "Polyarginine Interacts More Strongly and Cooperatively than Polylysine with Phospholipid Bilayers". The Journal of Physical Chemistry B. 120 (35): 9287–9296. doi:10.1021/acs.jpcb.6b05604. ISSN 1520-6106. PMC 5912336. PMID 27571288.{{cite journal}}: CS1 maint: PMC format (link)
  16. ^ a b Zhu, Z. Y.; Karlin, S. (1996-08-06). "Clusters of charged residues in protein three-dimensional structures". Proceedings of the National Academy of Sciences. 93 (16): 8350–8355. doi:10.1073/pnas.93.16.8350. ISSN 0027-8424. PMC 38674. PMID 8710874.{{cite journal}}: CS1 maint: PMC format (link)
  17. ^ Kushwaha, Ambuj K.; Grove, Anne (2013-02-01). "C-terminal low-complexity sequence repeats of Mycobacterium smegmatis Ku modulate DNA binding". Bioscience Reports. 33 (1): e00016. doi:10.1042/BSR20120105. ISSN 0144-8463. PMC 3553676. PMID 23167261. {{cite journal}}: no-break space character in |first= at position 6 (help)CS1 maint: PMC format (link)
  18. ^ Frugier, Magali; Bour, Tania; Ayach, Maya; Santos, Manuel A.S.; Rudinger-Thirion, Joëlle; Théobald-Dietrich, Anne; Pizzi, Elizabetta (2010-01-21). "Low Complexity Regions behave as tRNA sponges to help co-translational folding of plasmodial proteins". FEBS Letters. 584 (2): 448–454. doi:10.1016/j.febslet.2009.11.004.
  19. ^ Tyedmers, Jens; Mogk, Axel; Bukau, Bernd (2010-11). "Cellular strategies for controlling protein aggregation". Nature Reviews Molecular Cell Biology. 11 (11): 777–788. doi:10.1038/nrm2993. ISSN 1471-0072. {{cite journal}}: Check date values in: |date= (help)
  20. ^ Ling, Jiqiang; Cho, Chris; Guo, Li-Tao; Aerni, Hans R.; Rinehart, Jesse; Söll, Dieter (2012-12). "Protein Aggregation Caused by Aminoglycoside Action Is Prevented by a Hydrogen Peroxide Scavenger". Molecular Cell. 48 (5): 713–722. doi:10.1016/j.molcel.2012.10.001. PMC 3525788. PMID 23122414. {{cite journal}}: Check date values in: |date= (help)CS1 maint: PMC format (link)
  21. ^ a b Haerty, Wilfried; Golding, G. Brian (2010-10). Bonen, Linda (ed.). "Low-complexity sequences and single amino acid repeats: not just "junk" peptide sequences". Genome. 53 (10): 753–762. doi:10.1139/G10-063. ISSN 0831-2796. {{cite journal}}: Check date values in: |date= (help)
  22. ^ Faux, N. G. (2005-03-21). "Functional insights from the distribution and role of homopeptide repeat-containing proteins". Genome Research. 15 (4): 537–551. doi:10.1101/gr.3096505. ISSN 1088-9051. PMC 1074368. PMID 15805494.{{cite journal}}: CS1 maint: PMC format (link)
  23. ^ Albà, M.M.; Tompa, P.; Veitia, R.A. (2007), Volff, J.-N. (ed.), "Amino Acid Repeats and the Structure and Evolution of Proteins", Genome Dynamics, Basel: KARGER, pp. 119–130, doi:10.1159/000107607, ISBN 978-3-8055-8340-4, retrieved 2020-11-03
  24. ^ Marcotte, Edward M.; Pellegrini, Matteo; Yeates, Todd O.; Eisenberg, David (1999-10). "A census of protein repeats". Journal of Molecular Biology. 293 (1): 151–160. doi:10.1006/jmbi.1999.3136. {{cite journal}}: Check date values in: |date= (help)
  25. ^ Marino, Stefano M.; Gladyshev, Vadim N. (2012-02-10). "Analysis and Functional Prediction of Reactive Cysteine Residues". Journal of Biological Chemistry. 287 (7): 4419–4425. doi:10.1074/jbc.R111.275578. ISSN 0021-9258. PMC 3281665. PMID 22157013.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  26. ^ Dorsman, J. C. (2002-06-15). "Strong aggregation and increased toxicity of polyleucine over polyglutamine stretches in mammalian cells". Human Molecular Genetics. 11 (13): 1487–1496. doi:10.1093/hmg/11.13.1487.
  27. ^ Oma, Yoko; Kino, Yoshihiro; Sasagawa, Noboru; Ishiura, Shoichi (2004-05-14). "Intracellular Localization of Homopolymeric Amino Acid-containing Proteins Expressed in Mammalian Cells". Journal of Biological Chemistry. 279 (20): 21217–21222. doi:10.1074/jbc.M309887200. ISSN 0021-9258.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  28. ^ Ellegren, Hans (2004-06). "Microsatellites: simple sequences with complex evolution". Nature Reviews Genetics. 5 (6): 435–445. doi:10.1038/nrg1348. ISSN 1471-0056. {{cite journal}}: Check date values in: |date= (help)
  29. ^ Verstrepen, Kevin J; Jansen, An; Lewitter, Fran; Fink, Gerald R (2005-09). "Intragenic tandem repeats generate functional variability". Nature Genetics. 37 (9): 986–990. doi:10.1038/ng1618. ISSN 1061-4036. PMC 1462868. PMID 16086015. {{cite journal}}: Check date values in: |date= (help)CS1 maint: PMC format (link)
  30. ^ Siwach, Pratibha; Pophaly, Saurabh Dilip; Ganesh, Subramaniam (2006-07-01). "Genomic and Evolutionary Insights into Genes Encoding Proteins with Single Amino Acid Repeats". Molecular Biology and Evolution. 23 (7): 1357–1369. doi:10.1093/molbev/msk022. ISSN 1537-1719.
  31. ^ Moxon, Richard; Bayliss, Chris; Hood, Derek (2006-12). "Bacterial Contingency Loci: The Role of Simple Sequence DNA Repeats in Bacterial Adaptation". Annual Review of Genetics. 40 (1): 307–333. doi:10.1146/annurev.genet.40.110405.090442. ISSN 0066-4197. {{cite journal}}: Check date values in: |date= (help)
  32. ^ Toll-Riera, M.; Rado-Trilla, N.; Martys, F.; Alba, M. M. (2012-03-01). "Role of Low-Complexity Sequences in the Formation of Novel Protein Coding Sequences". Molecular Biology and Evolution. 29 (3): 883–886. doi:10.1093/molbev/msr263. ISSN 0737-4038.
  33. ^ Ohno, S.; Epplen, J. T. (1983-06-01). "The primitive code and repeats of base oligomers as the primordial protein-encoding sequence". Proceedings of the National Academy of Sciences. 80 (11): 3391–3395. doi:10.1073/pnas.80.11.3391. ISSN 0027-8424. PMC 394049. PMID 6574491.{{cite journal}}: CS1 maint: PMC format (link)
  34. ^ a b Trifonov, Edward N. (2009-09). "The origin of the genetic code and of the earliest oligopeptides". Research in Microbiology. 160 (7): 481–486. doi:10.1016/j.resmic.2009.05.004. {{cite journal}}: Check date values in: |date= (help)
  35. ^ Akashi, Hiroshi; Gojobori, Takashi (2002-03-19). "Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis". Proceedings of the National Academy of Sciences. 99 (6): 3695–3700. doi:10.1073/pnas.062526999. ISSN 0027-8424. PMC 122586. PMID 11904428.{{cite journal}}: CS1 maint: PMC format (link)
  36. ^ Barton, Michael D.; Delneri, Daniela; Oliver, Stephen G.; Rattray, Magnus; Bergman, Casey M. (2010-08-17). Bähler, Jürg (ed.). "Evolutionary Systems Biology of Amino Acid Biosynthetic Cost in Yeast". PLoS ONE. 5 (8): e11935. doi:10.1371/journal.pone.0011935. ISSN 1932-6203. PMC 2923148. PMID 20808905.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  37. ^ Radó-Trilla, Núria; Albà, MMar (2012). "Dissecting the role of low-complexity regions in the evolution of vertebrate proteins". BMC Evolutionary Biology. 12 (1): 155. doi:10.1186/1471-2148-12-155. ISSN 1471-2148. PMC 3523016. PMID 22920595.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  38. ^ a b Higgs, Paul G.; Pudritz, Ralph E. (2009-06). "A Thermodynamic Basis for Prebiotic Amino Acid Synthesis and the Nature of the First Genetic Code". Astrobiology. 9 (5): 483–490. doi:10.1089/ast.2008.0280. ISSN 1531-1074. {{cite journal}}: Check date values in: |date= (help)
  39. ^ Trifonov, E.N (2000-12). "Consensus temporal order of amino acids and evolution of the triplet code". Gene. 261 (1): 139–151. doi:10.1016/S0378-1119(00)00476-5. {{cite journal}}: Check date values in: |date= (help)
  40. ^ Trifonov, Edward N. (2004-08-01). "The Triplet Code From First Principles". Journal of Biomolecular Structure and Dynamics. 22 (1): 1–11. doi:10.1080/07391102.2004.10506975. ISSN 0739-1102. PMID 15214800.
  41. ^ Ferris, James P.; Hill, Aubrey R.; Liu, Rihe; Orgel, Leslie E. (1996-05). "Synthesis of long prebiotic oligomers on mineral surfaces". Nature. 381 (6577): 59–61. doi:10.1038/381059a0. ISSN 0028-0836. {{cite journal}}: Check date values in: |date= (help)