Micropeptides are commonly defined as polypeptides with a relatively arbitrary length of less than 100-150 amino acids (aa) in length. They are distinguishable from bioactive peptides as the former is generated from short open reading frames (sORFs), whereas the latter is a cleavage product of a larger polypeptide. They differ from canonical proteins which have an average length of 330 and 449 amino acids in prokaryotes and eukaryotes, respectively. Micropeptides, also known as microproteins or sORF-encoded peptides, can also be named according to their genomic location. For example, the translated product of an upstream open reading frame (uORF) might be called a uORF-encoded peptide (uPEP). Micropeptides lack an N-terminal signaling sequence, suggesting they are likely to be localized to the cytoplasm after translation. However, as more micropeptides are studied, they have been found in other cell compartments, as indicated by the existence of transmembrane micropeptides. They are expressed in both prokaryotic and eukaryotic organisms. The sORFs from which micropeptides are translated can be encoded in a variety of genomic regions. These include 5' UTRs, small genes, polycistronic mRNAs, or genes originally characterized as long non-coding RNAs (lncRNAs).
Given their small size, sORFs were originally overlooked. However, hundreds of thousands of putative micropeptides have been identified through various techniques in a multitude of organisms. Only a small fraction of these with coding potential have had their expression and function confirmed. Those that have been functionally characterized, in general, have roles in cell signaling, organogenesis, and cellular physiology. As more micropeptides are discovered so are more of their functions. One regulatory function is that of peptoswitches, which inhibit expression of downstream coding sequences by stalling ribosomes, through their direct or indirect activation by small molecules.
- 1 Techniques for identifying potential micropeptides
- 2 Validating protein-coding potential
- 3 Databases and Repositories
- 4 Prokaryotic examples
- 5 Eukaryotic examples
- 6 References
Techniques for identifying potential micropeptides
Various experimental techniques exist for identifying sORFs and their translational products. It should be noted that these techniques are only useful for identification of sORF that may produce micropeptides and not for direct functional characterization.
One method for finding potential sORFs, and therefore micropeptides, is through RNA-sequencing (RNA-Seq). RNA-Seq uses next-generation sequencing (NGS) to determine which RNAs are expressed in a given cell, tissue, or organism at a specific point in time. This collection of data, known as a transcriptome, can then be used as a resource for finding potential sORFs. Because of the strong likelihood of sORFs less than 100 aa occurring by chance, further study is necessary to determine the validity of data obtained using this method.
Ribosome profiling (Ribo-Seq)
Ribosome profiling has been used to identify potential micropeptides in a growing number of organisms, including fruit flies, zebrafish, mice and humans. One method uses compounds such as harringtonine, puromycin or lactimidomycin to stop ribosomes at translation initiation sites. This indicates where active translation is taking place. Translation elongation inhibitors, such as emetine or cycloheximide, may also be used to obtain ribosome footprints which are more likely to result in a translated ORF. If a ribosome is bound at or near a sORF, it putatively encodes a micropeptide.
Mass Spectrometry (MS) is the gold standard for identifying and sequencing proteins. Using this technique, investigators are able to determine if polypeptides are, in fact, translated from a sORF.
Proteogenomics combines proteomics, genomics, and transciptomics. This is important when looking for potential micropeptides. One method of using proteogenomics entails using RNA-Seq data to create a custom database of all possible polypeptides. Liquid chromatography followed by tandem MS (LC-MS/MS) is performed to provide sequence information for translation products. Comparison of the transcriptomic and proteomics data can be used to confirm the presence of micropeptides.
Phylogenetic conservation can be a useful tool, particularly when sifting through a large database of sORFs. The likelihood of a sORF resulting in a functional micropeptide is more likely if it is conserved across numerous species. However, this will not work for all sORFs. For example, those that are encoded by lncRNAs are less likely to be conserved given lncRNAs themselves do not have high sequence conservation. Further experimentation will be necessary to determine if a functional micropeptide is in fact produced.
Validating protein-coding potential
Custom antibodies targeted to the micropeptide of interest can be useful for quantifying expression or determining intracellular localization. As is the case with most proteins, low expression may make detection difficult. The small size of the micropeptide can also lead to difficulties in designing an epitope from which to target the antibody.
Tagging with CRISPR-Cas9
Genome editing can be used to add FLAG/MYC or other small peptide tags to an endogenous sORF, thus creating fusion proteins. In most cases, this method is beneficial in that it can be performed more quickly than developing a custom antibody. It is also useful for micropeptides for which no epitope can be targeted.
In Vitro Translation
This process entails cloning the full-length micropeptide cDNA into a plasmid containing a T7 or SP6 promoter. This method utilizes a cell-free protein-synthesizing system in the presence of 35S-methionine to produce the peptide of interest. The products can then be analyzed by gel electrophoresis and the 35S-labeled peptide is visualized using autoradiography.
Databases and Repositories
There are several repositories and databases that have been created for both sORFs and micropeptides. A repository for of small ORFs discovered by ribosome profiling can be found at sORFs.org. A repository of putative sORF-encoded peptides in Arabidopsis thaliana can be found at ARA-PEPs. A database of small proteins, especially encoded by non-coding RNAs can be found at SmProt.
To date, most micropeptides have been identified in prokaryotic organisms. While most have yet to be fully characterized, of those that have been studied, many appear to be critical to the survival of these organisms. Because of their small size, prokaryotes are particularly susceptible to changes in their environment, and as such have developed methods to ensure their existence.
Escherichia coli (E. coli)
Micropeptides expressed in E. coli exemplify bacterial environmental adaptations. Most of these have been classified into three groups: leader peptides, ribosomal proteins, and toxic proteins. Leader proteins regulate transcription and/or translation of proteins involved in amino acid metabolism when amino acids are scarce. Ribosomal proteins include L36 (rpmJ) and L34 (rpmH), two components of the 50S ribosomal subunit. Toxic proteins, such as ldrD, are toxic at high levels and can kill cells or inhibit growth, which functions to reduce the host cell’s viability.
Salmonella enterica (S. enterica)
In S. enterica, the MgtC virulence factor is involved in adaptation to low magnesium environments. The hydrophobic peptide MgrR, binds to MgtC, causing its degradation by the FtsH protease.
Bacillus subtilis (B. subtilis)
The 46 aa Sda micropeptide, expressed by B. subtilis, represses sporulation when replication initiation is impaired. By inhibiting the histidine Kinase KinA, Sda prevents the activation of the transcription factor Spo0A, which is required for sporulation.
Staphylococcus aureus (S. aureus)
In S. aureus, there are a group of micropeptides, 20-22 aa, that are excreted during host infection to disrupt neutrophil membranes, causing cell lysis. These micropeptides allow the bacterium to avoid degradation by the human immune systems' main defenses.
Micropeptides have been discovered in eukaryotic organisms from Arabidopsis thaliana to humans. They play diverse roles in tissue and organ development, as well as maintenance and function once fully developed. While many are yet to be functionally characterized, and likely more remain to be discovered, below is a summary of recently identified eukaryotic micropeptide functions.
Arabidopsis thaliana (A. thaliana)
The POLARIS (PLS) gene encodes a 36 aa micropeptide. It is necessary for proper vascular leaf patterning and cell expansion in the root. This micropeptide interacts with developmental PIN proteins to form a critical network for hormonal crosstalk between auxin, ethylene, and cytokinin.
ROTUNDIFOLIA (ROT4) in A. thaliana encodes a 53 aa peptide, which localizes to the plasma membrane of leaf cells. The mechanism of ROT4 function is not well understood, but mutants have short rounded leaves, indicating that this peptide may be important in leaf morphogenesis.
Zea mays (Z. mays)
Brick1 (Brk1) encodes a 76 aa micropeptide, which is highly conserved in both plants and animals. In Z. mays, it was found to be involved in morphogenesis of leaf epithelia, by promoting multiple actin-dependent cell polarization events in the developing leaf epidermis. Zm401p10 is an 89 aa micropeptide, which plays a role in normal pollen development in the tapetum. After mitosis it also is essential in the degradation of the tapetum. Zm908p11 is a micropeptide 97 aa in length, encoded by the Zm908 gene that is expressed in mature pollen grains. It localizes to the cytoplasm of pollen tubes, where it aids in their growth and development.
Drosophila melanogaster (D. melanogaster)
The evolutionarily conserved polished rice (pri) gene, known as tarsal-less (tal) in D. melanogaster, is involved in epidermal differentiation. This polycistronic transcript encodes four similar peptides, which range between 11-32 aa in length. They function to truncate the transcription factor Shavenbaby (Svb). This converts Svb into an activator that directly regulates the expression of target effectors, including miniature (m) and shavenoid (sha), which are together responsible for trichome formation.
Danio rerio (D. rerio)
The Toddler (tdl) gene is believed to be important for embryogenesis, and is specifically expressed during late blastula and gastrula stages. During gastrulation, it is critical in promoting the internalization and animal-pole directed movement of mesendodermal cells. After gastrulation, it is expressed in the lateral mesoderm, endoderm, as well as the anterior, and posterior, notochord. Although it is annotated as a lncRNA in zebrafish, mouse, and human, the 58 aa sORF was found to be highly conserved in vertebrates.
Mus musculus (M. musculus)
Myoregulin (Mln) is encoded by a gene originally annotated as a lncRNA. Mln is expressed in all 3 types of skeletal muscle, and works similarly to the micropeptides phospholamban (Pln) in the cardiac muscle and sarcolipin (Sln) in slow (Type I) skeletal muscle. These micropeptides interact with sarcoplasmic reticulum Ca2+-ATPase (SERCA), a membrane pump responsible for regulating Ca2+ uptake into the sarcoplasmic reticulum (SR). By inhibiting Ca2+ uptake into the SR, they cause muscle relaxation. Similarly, the endoregulin (ELN) and another-regulin (ALN) genes code for transmembrane micropeptides that contain the SERCA binding motif, and are conserved in mammals.
Myomixer (Mymx) is encoded by the gene Gm7325, a muscle-specific peptide, 84 aa in length, which plays a role during embryogenesis in fusion and skeletal muscle formation. It localizes to the plasma membrane, associating with a fusogenic membrane protein, Myomaker (Mymk). In humans, the gene encoding Mymx is annotated as uncharacterized LOC101929726. Orthologs are found in the turtle, frog and fish genomes as well.
Homo sapiens (H. sapiens)
In humans, NoBody (non-annotated P-body dissociating polypeptide), a 68 aa micropeptide, was discovered in the long intervening noncoding RNA (lincRNA) LINC01420. It has high sequence conservation among mammals, and localizes to P-bodies. It enriches proteins associated with 5’ mRNA decapping. It is thought to interact directly with Enhancer of mRNA Decapping 4 (EDC4).
The C7orf49 gene, conserved in mammals, when alternatively spliced is predicted to produce three micropeptides. MRI-1 was previously found to be a modulator of retrovirus infection. The second predicted micropeptide, MRI-2, may be important in non-homologous end joining (NHEJ) of DNA double strand breaks. In Co-Immunoprecipitation experiments, MRI-2 bound to Ku70 and Ku80, two subunits of Ku, which play a major role in the NHEJ pathway.
The 24 amino acid micropeptide, Humanin (HN), interacts with the apoptosis-inducing protein Bcl2-associated X protein (Bax). In its active state, Bax undergoes a conformational change which exposes membrane-targeting domains. This causes it to move from the cytosol to the mitochondrial membrane, where it inserts and releases apoptogenic proteins such as cytochrome c. By interacting with Bax, HN prevents Bax targeting of the mitochondria, thereby blocking apoptosis.
A micropeptide of 90aa, ‘Small Regulatory Polypeptide of Amino Acid Response’ or SPAAR, was found to be encoded in the lncRNA LINC00961. It is conserved between human and mouse, and localizes to the late endosome/lysosome. SPAAR interacts with four subunits of the v-ATPase complex, inhibiting mTORC1 translocation to the lysosomal surface where it is activated. Down-regulation of this micropeptide enables mTORC1 activation by amino acid stimulation, promoting muscle regeneration.
- Crappé J, Van Criekinge W, Menschaert G (2014). "Little things make big things happen: A summary of micropeptide encoding genes". EuPA Open Proteomics. 3: 128–137. doi:10.1016/j.euprot.2014.02.006.
- Makarewich CA, Olson EN (September 2017). "Mining for Micropeptides". Trends in Cell Biology. 27 (9): 685–696. doi:10.1016/j.tcb.2017.04.006. PMID 28528987.
- Guillén G, Díaz-Camino C, Loyola-Torres CA, Aparicio-Fabre R, Hernández-López A, Díaz-Sánchez M, Sanchez F (2013). "Detailed analysis of putative genes encoding small proteins in legume genomes". Frontiers in Plant Science. 4: 208. doi:10.3389/fpls.2013.00208. PMID 23802007.
- Hashimoto Y, Kondo T, Kageyama Y (June 2008). "Lilliputians get into the limelight: novel class of small peptide genes in morphogenesis". Development, Growth & Differentiation. 50 Suppl 1: S269–76. doi:10.1111/j.1440-169x.2008.00994.x. PMID 18459982.
- Zhang J (March 2000). "Protein-length distributions for the three domains of life". Trends in Genetics. 16 (3): 107–9. doi:10.1016/s0168-9525(99)01922-8. PMID 10689349.
- Rothnagel J, Menschaert G (May 2018). "Short Open Reading Frames and Their Encoded Peptides". Proteomics. 18 (10): e1700035. doi:10.1002/pmic.201700035. PMID 29691985.
- Anderson DM, Anderson KM, Chang CL, Makarewich CA, Nelson BR, McAnally JR, Kasaragod P, Shelton JM, Liou J, Bassel-Duby R, Olson EN (February 2015). "A micropeptide encoded by a putative long noncoding RNA regulates muscle performance". Cell. 160 (4): 595–606. doi:10.1016/j.cell.2015.01.009. PMID 25640239.
- Bi P, Ramirez-Martinez A, Li H, Cannavino J, McAnally JR, Shelton JM, Sánchez-Ortiz E, Bassel-Duby R, Olson EN (April 2017). "Control of muscle formation by the fusogenic micropeptide myomixer". Science. 356 (6335): 323–327. doi:10.1126/science.aam9361. PMID 28386024.
- Alix E, Blanc-Potard AB (February 2008). "Peptide-assisted degradation of the Salmonella MgtC virulence factor". The EMBO Journal. 27 (3): 546–57. doi:10.1038/sj.emboj.7601983. PMID 18200043.
- Burkholder WF, Kurtser I, Grossman AD (January 2001). "Replication initiation proteins regulate a developmental checkpoint in Bacillus subtilis". Cell. 104 (2): 269–79. doi:10.1016/s0092-8674(01)00211-2. PMID 11207367.
- Andrews SJ, Rothnagel JA (March 2014). "Emerging evidence for functional peptides encoded by short open reading frames". Nature Reviews. Genetics. 15 (3): 193–204. doi:10.1038/nrg3520. PMID 24514441.
- Bazzini AA, Johnstone TG, Christiano R, Mackowiak SD, Obermayer B, Fleming ES, Vejnar CE, Lee MT, Rajewsky N, Walther TC, Giraldez AJ (May 2014). "Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation". The EMBO Journal. 33 (9): 981–93. doi:10.1002/embj.201488411. PMID 24705786.
- Ingolia NT, Brar GA, Stern-Ginossar N, Harris MS, Talhouarne GJ, Jackson SE, Wills MR, Weissman JS (September 2014). "Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes". Cell Reports. 8 (5): 1365–79. doi:10.1016/j.celrep.2014.07.045. PMID 25159147.
- Stern-Ginossar N, Ingolia NT (November 2015). "Ribosome Profiling as a Tool to Decipher Viral Complexity". Annual Review of Virology. 2 (1): 335–49. doi:10.1146/annurev-virology-100114-054854. PMID 26958919.
- "sORFs.org: repository of small ORFs identified by ribosome profiling". sorfs.org. Retrieved 2018-12-14.
- Olexiouk V, Crappé J, Verbruggen S, Verhegen K, Martens L, Menschaert G (January 2016). "sORFs.org: a repository of small ORFs identified by ribosome profiling". Nucleic Acids Research. 44 (D1): D324–9. doi:10.1093/nar/gkv1175. PMC 4702841. PMID 26527729.
- "ARA-PEPs: A Repository of putative sORF-encoded peptides in Arabidopsis thaliana". www.biw.kuleuven.be. Retrieved 2018-12-14.
- Hazarika RR, De Coninck B, Yamamoto LR, Martin LR, Cammue BP, van Noort V (January 2017). "ARA-PEPs: a repository of putative sORF-encoded peptides in Arabidopsis thaliana". BMC Bioinformatics. 18 (1): 37. doi:10.1186/s12859-016-1458-y. PMID 28095775.
- "SmProt: a database of small proteins encoded by annotated coding and non-coding RNA loci". bioinfo.ibp.ac.cn. Retrieved 2018-12-14.
- Hao Y, Zhang L, Niu Y, Cai T, Luo J, He S, Zhang B, Zhang D, Qin Y, Yang F, Chen R (July 2018). "SmProt: a database of small proteins encoded by annotated coding and non-coding RNA loci". Briefings in Bioinformatics. 19 (4): 636–643. doi:10.1093/bib/bbx005. PMID 28137767.
- Hemm MR, Paul BJ, Schneider TD, Storz G, Rudd KE (December 2008). "Small membrane proteins found by comparative genomics and ribosome binding site models". Molecular Microbiology. 70 (6): 1487–501. doi:10.1111/j.1365-2958.2008.06495.x. PMID 19121005.
- Wang R, Braughton KR, Kretschmer D, Bach TH, Queck SY, Li M, Kennedy AD, Dorward DW, Klebanoff SJ, Peschel A, DeLeo FR, Otto M (December 2007). "Identification of novel cytolytic peptides as key virulence determinants for community-associated MRSA". Nature Medicine. 13 (12): 1510–4. doi:10.1038/nm1656. PMID 17994102.
- Hemm MR, Paul BJ, Miranda-Ríos J, Zhang A, Soltanzad N, Storz G (January 2010). "Small stress response proteins in Escherichia coli: proteins missed by classical proteomic studies". Journal of Bacteriology. 192 (1): 46–58. doi:10.1128/jb.00872-09. PMID 19734316.
- Casson SA, Chilley PM, Topping JF, Evans IM, Souter MA, Lindsey K (August 2002). "The POLARIS gene of Arabidopsis encodes a predicted peptide required for correct root growth and leaf vascular patterning". The Plant Cell. 14 (8): 1705–21. doi:10.1105/tpc.002618. PMID 12172017.
- Chilley PM, Casson SA, Tarkowski P, Hawkins N, Wang KL, Hussey PJ, Beale M, Ecker JR, Sandberg GK, Lindsey K (November 2006). "The POLARIS peptide of Arabidopsis regulates auxin transport and root growth via effects on ethylene signaling". The Plant Cell. 18 (11): 3058–72. doi:10.1105/tpc.106.040790. PMID 17138700.
- Liu J, Mehdi S, Topping J, Friml J, Lindsey K (2013). "Interaction of PLS and PIN and hormonal crosstalk in Arabidopsis root development". Frontiers in Plant Science. 4: 75. doi:10.3389/fpls.2013.00075. PMID 23577016.
- Narita NN, Moore S, Horiguchi G, Kubo M, Demura T, Fukuda H, Goodrich J, Tsukaya H (May 2004). "Overexpression of a novel small peptide ROTUNDIFOLIA4 decreases cell proliferation and alters leaf shape in Arabidopsis thaliana". The Plant Journal. 38 (4): 699–713. doi:10.1111/j.1365-313x.2004.02078.x. PMID 15125775.
- Frank MJ, Smith LG (May 2002). "A small, novel protein highly conserved in plants and animals promotes the polarized growth and division of maize leaf epidermal cells". Current Biology. 12 (10): 849–53. doi:10.1016/s0960-9822(02)00819-9. PMID 12015123.
- Wang D, Li C, Zhao Q, Zhao L, Wang M, Zhu D, Ao G, Yu J (2009). "Zm401p10, encoded by an anther-specific gene with short open reading frames, is essential for tapetum degeneration and anther development in maize". Functional Plant Biology. 36 (1): 73. doi:10.1071/fp08154.
- Dong X, Wang D, Liu P, Li C, Zhao Q, Zhu D, Yu J (May 2013). "Zm908p11, encoded by a short open reading frame (sORF) gene, functions in pollen tube growth as a profilin ligand in maize". Journal of Experimental Botany. 64 (8): 2359–72. doi:10.1093/jxb/ert093. PMID 23676884.
- Kondo T, Plaza S, Zanet J, Benrabah E, Valenti P, Hashimoto Y, Kobayashi S, Payre F, Kageyama Y (July 2010). "Small peptides switch the transcriptional activity of Shavenbaby during Drosophila embryogenesis". Science. 329 (5989): 336–9. doi:10.1126/science.1188158. PMID 20647469.
- Pauli A, Norris ML, Valen E, Chew GL, Gagnon JA, Zimmerman S, Mitchell A, Ma J, Dubrulle J, Reyon D, Tsai SQ, Joung JK, Saghatelian A, Schier AF (February 2014). "Toddler: an embryonic signal that promotes cell movement via Apelin receptors". Science. 343 (6172): 1248636. doi:10.1126/science.1248636. PMID 24407481.
- Chng SC, Ho L, Tian J, Reversade B (December 2013). "ELABELA: a hormone essential for heart development signals via the apelin receptor". Developmental Cell. 27 (6): 672–80. doi:10.1016/j.devcel.2013.11.002. PMID 24316148.
- D'Lima NG, Ma J, Winkler L, Chu Q, Loh KH, Corpuz EO, Budnik BA, Lykke-Andersen J, Saghatelian A, Slavoff SA (February 2017). "A human microprotein that interacts with the mRNA decapping complex". Nature Chemical Biology. 13 (2): 174–180. doi:10.1038/nchembio.2249. PMID 27918561.
- Slavoff SA, Heo J, Budnik BA, Hanakahi LA, Saghatelian A (April 2014). "A human short open reading frame (sORF)-encoded polypeptide that stimulates DNA end joining". The Journal of Biological Chemistry. 289 (16): 10950–7. doi:10.1074/jbc.c113.533968. PMID 24610814.
- Guo B, Zhai D, Cabezas E, Welsh K, Nouraini S, Satterthwait AC, Reed JC (May 2003). "Humanin peptide suppresses apoptosis by interfering with Bax activation". Nature. 423 (6938): 456–61. doi:10.1038/nature01627. PMID 12732850.
- Matsumoto A, Pasut A, Matsumoto M, Yamashita R, Fung J, Monteleone E, Saghatelian A, Nakayama KI, Clohessy JG, Pandolfi PP (January 2017). "mTORC1 and muscle regeneration are regulated by the LINC00961-encoded SPAR polypeptide". Nature. 541 (7636): 228–232. doi:10.1038/nature21034. PMID 28024296.