= Chromosome 5 open reading frame 47 =

Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.

== Gene ==
C5ORF47 is located at 5q35.2. The full gene spans 16,911 nucleotides, and the mRNA transcript, made up of 5 exons, spans 2511 nucleotides.

=== Gene expression ===
Human C5ORF47 is primarily expressed in the testis, as well as expressed in low levels in many tissues - stomach, lung, kidney, intestine, heart, and adrenal - in varying levels and times throughout fetal development.

== Transcript ==
The mRNA sequence of C5ORF47 is 2511 nucleotides long and consists of 5 exons and 6 introns

The human C5ORF47 gene has two known isoforms, the first (XP_016864517.1) encodes a protein that is 176 amino acids in length, and the second (XP_011532733.1) encodes a protein that is 150 amino acids in length. This article will primarily focus on the first, more common isoform.

== Protein ==
The molecular weight of the unmodified precursor human C5ORF47 protein is 19.2 kDal, and the isoelectric point is predicted to be 10.49.

The protein is basic and appears to have a relatively high concentration of positively charged amino acids, lysine and arginine, in comparison to negatively charged amino acids, aspartic acid and glutamic acid. The repetitive amino acid structure, “SQLR”, can be found in two locations in the protein.

=== Domains ===
The human C5ORF47 protein contains DUF4680, a Domain of Unknown Function, that is characterized by two conserved amino acid sequence motifs: VISRM and ENE.

=== Structure ===

Within the predicted tertiary structure of C5ORF47, the most conserved amino acids fall within the Domain of Unknown Function: DUF4680.

=== Cellular Localization ===
The human C5ORF47 protein is predicted to be localized in the nucleus.

A stretch of five positively charged lysines, indicating a nuclear localization sequence, can be found at positions 133-137 of the amino acid sequence.

Immunohistochemical staining of human testis from Sigma Aldrich shows moderate nuclear positivity in cells in seminiferous ducts of the testes.

=== Post-Translational Modifications ===

Predicted phosphorylation sites T19, S23, S96, S129, S149, Y158, S168, S169, and O-linked glycosylation sites S96, S105, and S168 are conserved among most mammalian orthologs.

== Homology ==

Orthologs of the human C5ORF47 gene can be found in mammals, birds, and reptiles, but not in amphibians, fish, or invertebrates. No paralogs of the human C5ORF47 gene are known.
| | Taxonomic Order | Genus and species | Common name | Median Date of Divergence (MYA) | Sequence length | Sequence identity to human protein (%) | Sequence similarity to human protein (%) |
| Mammalia | Primates | Homo sapiens | Human | 0 | 176 | 100 | 100 |
| | Primates | Gorilla gorilla | Gorilla | 8.6 | 176 | 97.7 | 98 |
| | Rodentia | Mus musculus | House Mouse | 87 | 165 | 50 | 63.1 |
| | Chiroptera | Pteropus giganteus | Indian flying fox | 94 | 176 | 58 | 64.9 |
| | Carnivora | Neomonachus schauinslandi | Hawaiian monk seal | 94 | 178 | 56.7 | 63.9 |
| Aves | Passeriforme | Onychostruthus taczanowskii | White-rumped snowfinch | 319 | 173 | 33.2 | 42.8 |
| | Casuariiforme | Dromaius novaehollandiae | Emu | 319 | 182 | 32 | 44.5 |
| | Passeriformes | Taeniopygia guttata | Zebra Finch | 319 | 177 | 30.8 | 39.3 |
| | Apterygiforme | Apteryx rowi | Okarito Kiwi | 319 | 180 | 30.5 | 41 |
| | Caprimulgiformes | Antrostomus carolinensis | Chuck-will's-widow | 319 | 229 | 26.8 | 37 |
| | Apodiformes | Calypte anna | Anna's humming bird | 319 | 228 | 24 | 35.6 |
| | Galliformes | Phasianus colchicus | Ring-necked Pheasant | 319 | 263 | 20.5 | 30.9 |
| Reptilia | Squamata | Zootoca vivipara | Viparous Lizard | 319 | 205 | 30 | 42.3 |
| | Squamata | Podarcis muralis | Common wall lizard | 319 | 243 | 28.9 | 39.9 |
| | Testudines | Dermochelys coriacea | Leatherback sea turtle | 319 | 242 | 27.6 | 41.3 |
| | Testudines | Gopherus flavomarginatus | Bolson Tortoise | 319 | 235 | 26.6 | 38.1 |
| | Testudines | Chelonoidis abingdonii | Pinta Island tortoise | 319 | 234 | 26.6 | 35.5 |
| | Squamata | Sceloporus undulatus | Eastern fence lizard | 319 | 241 | 24.7 | 34.9 |
| | Crocodilia | Alligator mississippiensis | American Alligator | 319 | 307 | 22.8 | 32.2 |

== Interacting Proteins ==
Proteins that are predicted to interact with the human C5ORF47 protein tend have characteristics such as testes-specific, pertaining to sperm or spermatogenesis, or related to cilia/flagella formation.
| Interacting Protein | Full Name | Cellular Compartment | Function |
| CCDC185 | Coiled-coil domain-containing protein 185 | Cellular localization unknown | Has a role in ciliogenesis (by similarity). Required for proper cephalic and left/right axis development |
| C10orf120 | Uncharacterized protein C10orf120 | Cellular localization unknown | Diseases associated with C10orf120 include Vas Deferens, Congenital Bilateral Aplasia which occurs in males when the tubes that carry sperm out of the testes (the vas deferens) fail to develop properly. |
| C4orf22 | Uncharacterized protein C4orf22 | Predicted to be located in cytoplasm. | Cilia and flagella associated protein. |
| TGIF2LX | Tgfb induced factor homeobox 2 like, x-linked; Homeobox protein TGIF2LX | Predicted to be located in the nucleus | May have a transcription role in testis. Testis-specific expression suggests that this gene may play a role in spermatogenesis. |
| ZPLD1 | Zona pellucida-like domain-containing protein 1 | | Glycoprotein which is a component of the gelatinous extracellular matrix in the cupulae of the vestibular organ |
| SPERT | Spermatid-associated protein | Predicted to be located in cytoplasmic vesicle. | Enables identical protein binding activity. |
| ZNF606 | Zinc finger protein 606 | Predicted to be located in the nucleus | Nuclear protein that can act as a transcriptional repressor of growth factor-mediated signaling pathways. Reduced expression of this gene promotes chondrocyte differentiation |
| C3orf20 | Uncharacterized protein C3orf20 | Predicted to be located in cytoplasm. | Unknown function |
| C14orf119 | Uncharacterized protein C14orf119 | Located in cytosol and mitochondria. | Unknown function |

== Clinical Significance ==
In a study conducted to identify rare genetic variants contributing to Neuromyelitis optica in Finland, Four missense variants were shared by two patients in C3ORF20, PDZD2, C5ORF47 and ZNF606.

Microarray data shows that human C5ORF47 expression is low in an individual with teratozoospermia, which is characterized by the presence of spermatozoa with abnormal morphology over 85% in sperm.

Microarray data shows that human C5ORF47 expression is lower in p63 depleted cells. The p63 protein functions as a transcription factor that helps regulate numerous cell activities, including cell proliferation, cell maintenance, differentiation, cell adhesion, and apoptosis. The p63 protein also plays a critical role in the formation of ectodermal structures in early development. Studies suggest that it also plays essential roles in the development of the limbs, facial features, urinary system, and other organs and tissues.
