= C3orf52 =

Chromosome 3 open reading frame 5 (C3orf52), also known as TTMP or TPA-Induced Transmembrane Protein (accession: NP_078892), is an uncharacterized protein encoded in humans by the C3orf52 gene. C3orf52 (accession: NM_024616) is located on the plus strand of chromosome 3, at gene locus q.13.2 and spans approximately 31,822 base pairs. C3orf52 encodes a transmembrane protein whose predicted function is to act as a membrane -associated regulatory factor in epithelial tissues. Structural features within the protein, including a conserved transmembrane domain, a SEA domain, and extensive glycosylation, suggest a role in protein-protein interactions. Evidence supports that C3orf52 acts as a cofactor for LIPH, facilitating localized lysophosphatidic acid production required for hair follicle morphogenesis. Loss of C3orf52 disrupts this lipid signaling pathway, resulting in autosomal recessive hypotrichosis.

== Expression patterns ==

Immunohistochemical micrographs from the Human Protein Atlas demonstrate C3orf52 expression in colon and stomach tissues. Localization is concentrated along the luminal borders of the epithelial cells in the colon, and is highly abundant in glandular cells. These findings suggest that C3orf52 exhibits membrane-associated expression.

Analysis of human tissue expression indicates that C3orf52 displays approximately fourfold variation across tissues. C3orf52 is tissue-restricted and highly regulated. The highest expression is observed in the thyroid and salivary glands (Figure 2), while other sources report moderately high expression in the skin, pancreas, and stomach. Expression in other analyzed tissues, such as the heart and brain, shows low to undetectable.

== Molecular features ==

=== mRNA ===
C3orf52 has two transcript variants. Transcript variant 1 is a shorter transcript (753 nucleotides) but encodes a longer isoform (TPA-induced transmembrane protein isoform 1). Transcript variant 2 (654 nucleotides) encodes a shorter isoform (TPA- induced transmembrane protein isoform 2) with a different C-terminus. Transcript variant 2 consists of six exons.

=== Protein ===
C3orf52 encodes two isoforms. TPA-induced transmembrane protein isoform 1 is 250 amino acids long, while isoform 2 is 217 amino acids long. Isoform 2 is the predominant isoform of C3orf52 in humans, and was used as the basis for subsequent research on this protein. The C3orf52 protein includes a disordered region, a transmembrane region, and a major polyA site. The molecular weight is of 24.3 kDa. This protein is predicted to be localized primarily in the cytoplasm (94.1%), with specific localization to the endoplasmic reticulum (44.4%).

Analysis of the human C3orf52 protein using the SAPS tool showed no significant positive, negative, or mixed charge clusters, and no known sequence patterns were identified.

Compositional analysis indicates reduced levels of alanine, histidine, and arginine residues and an increased number of glutamic acid residues compared to standard human protein levels, indicating that this protein is acidic. C3orf52 has a predicted isoelectric point of 3.99. Additionally, results show a high-scoring transmembrane segment spanning amino acids 66 to 93, which is also among the protein's most hydrophobic segment. C3orf52 contains a repetitive four-amino acid motif, including a sequence reading "LELS" at amino acids 13-16 and repeating at positions 101–104, this region is not conserved among orthologs.

The C3orf52 protein has a predicted SEA (Sea urchin sperm protein, Enterokinase, Agrin domain) domain spanning from amino acid positions 128–172. Approximately half of the amino acids in this domain of the human protein are conserved among 70% of the orthologs listed in Table 1. However, all orthologs in the table have this domain in their corresponding region of the protein, although they contain different residues. SEA domains are common in eukaryotes and are found within extracellular proteins located in highly glycosylated environments.

== Post-translational modifications ==
Human C3orf52 is predicted to contain three phosphorylation sites at positions 140,180, and 183, two N-glycosylation sites at positions 106 and 159, and three O-linked glycosylation sites at positions 7, 26, and 37. All of the O-linked glycosylation sites are within the disordered region of this protein. This indicates that C3orf52 is a moderately regulated protein that likely functions more as a scaffold than as a structural protein.

== Evolutionary history ==

=== Paralogs ===
An NCBI protein BLAST search showed no known paralogs of C3orf52 in humans.

=== Orthologs ===
C3orf52 retains its sequence with identifiable orthologs exclusively within vertebrates (see Table 1 below), specifically up to the elasmobranchii lineage, but is absent in more distantly related groups. Outside of vertebrates, including all invertebrate animals, bacteria, archaea, fungi, plants, and protists- no identifiable sequence homology is detected for this protein. The most distant homolog detected of C3orf52 was in the elephant shark (Callorhinchus milli), which diverged about 495.2 million years before humans, suggesting that this is approximately when the C3orf52 gene first arose.

  - Table 1. Human C3orf52 orthologs sorted by estimated date of divergence from humans and protein sequence identity**

| Genus and species | Common name | Taxonomic group | Date of divergence (MYA) | Sequence length (AA) | AA identity (%) | AA similarity (%) | Accession number |
| Homo sapiens | Human | Primates | 0 | 217 | 100 | 100 | NP_078892.3 |
| Pan troglodytes | Chimpanzee | Primates | 6.4 | 217 | 97 | 98 | XP_001154447.2 |
| Trachypithecus francoisi | Francois' leaf monkey | Primates | 28.8 | 217 | 89 | 93 | XP_033067213.1 |
| Equus przewalskii | Przewalski's horse | Perissodactyla | 94 | 250 | 77 | 87 | XP_008528528.1 |
| Diceros bicornis minor | South-central black rhinoceros | Perissodactyla | 94 | 251 | 74 | 85 | XP_058411758.1 |
| Tachyglossus aculeatus | Australian echidna | Monotremata | 180 | 214 | 52 | 67 | XP_038622330.1 |
| Struthio camelus | Common ostrich | Struthioniformes | 319 | 218 | 37 | 54 | XP_068764855.1 |
| Empidonax traillii | Willow flycatcher | Passeriformes | 319 | 211 | 36 | 53 | XP_027757055.1 |
| Mauremys reevesii | Chinese pond turtle | Testudines | 319 | 233 | 43 | 58 | XP_039395838.1 |
| Carettochelys insculpta | Pig-nosed turtle | Testudines | 319 | 214 | 41 | 56 | XP_074839613.1 |
| Chelonoidis abingdonii | Pinta island tortoise | Testudines | 319 | 218 | 40 | 54 | XP_074928790.1 |
| Ambystoma mexicanum | Axolotl | Caudata | 352 | 239 | 42 | 56 | XP_069492559.1 |
| Rhinatrema bivittatum | Rhinatrema | Gymnophiona | 352 | 217 | 41 | 60 | XP_029434280.1 |
| Pleurodeles waltl | Iberian ribbed newt | Caudata | 352 | 236 | 40 | 52 | XP_069060250.1 |
| Microcaecilia unicolor | Tiny cayenne caecilian | Gymnophiona | 352 | 225 | 39 | 58 | XP_030060067.1 |
| Siphateles boraxobius | Borax lake chub | Cypriniformes | 429 | 248 | 34 | 52 | XP_077063864.1 |
| Acipenser ruthenus | Sterlet | Acipenseriformes | 429 | 260 | 33 | 52 | XP_058884635.1 |
| Etheostoma spectabile | Orangethroat darter | Perciformes | 429 | 268 | 32 | 49 | XP_032360225.1 |
| Pseudorasbora parva | Stone moroko | Cypriniformes | 429 | 251 | 30 | 49 | XP_067307703.1 |
| Carcharodon carcharias | Great white shark | Lamnifores | 462 | 376 | 29 | 50 | XP_041067311.1 |
| Callorhinchus milli | Elephant shark | Chimaeriformes | 495 | 835 | 18 | 40 | XP_007894202.1 |

=== Protein divergence ===
C3orf52 is evolving more quickly than other common proteins including cytochrome c and fibrinogen alpha chain. This would suggest some sort of selective pressure on the protein driving its rapid evolution.

== Protein interactions ==
| Protein | ID | Description | Detection method | Subcellular location |
| Hypotrichosis 7 | HYPT7 | Mutations in the lipase H gene are linked to autosomal recessive hypotrichosis | affinity chromatography technology | Endoplasmic reticulum and luminal side of membranes |
| Chromosome 19 open reading frame 75 | C19orf75 | Uncharacterized protein | affinity chromatography technology | Predicted membrane-associated |
| Neonatal fragment crystallizable receptor | FCRN | Neonatal Fc receptor. Responsible for transferring immunity from mother to newborn | affinity chromatography technology | Endosomal membrane, plasma membrane, and endoplasmic reticulum |
| Transmembrane protein 30B | TMEM30B | Subunit of P4-ATPase complex which is involved in the transport of lipids across the cell membrane | affinity chromatography technology | Endoplasmic reticulum and plasma membrane |
| B-cell antigen receptor complex-associated protein beta chain | CD79b | Forms the B-cell receptors and is required to initiate signal cascade when antigen binds to B-cell | affinity chromatography technology | Plasma membrane |
| Platelet glycoprotein 4 | CD36 | Multifunctional glycoprotein that acts as a receptor for molecules such as fatty acids, collagen, and thrombospondin | affinity chromatography technology | Plasma membrane and endosome |
Table 2. Protein interactions found using BioGrid. All interactions were physical interactions and had very confident significant scores.

A STRING protein association search yielded no confident results, or ones that seem significant based on previous findings.

== Conceptual translation ==

Conceptual translation of the human C3orf52 protein isoform 2, along with full mRNA and annotations is shown in Figure 4.

== Clinical significance ==
Several studies resulting in an initial information search on C3orf52 focused on the likely involvement of this gene in lipase H-mediated lysophosphatidic acid biosynthesis, a step in hair-follicle formation. Evidence shows that decreased expression of C3orf52 has been linked to localized autosomal recessive hypotrichosis, a condition resulting in the absence of hair. There were three relevant single-nucleotide polymorphisms found with clinical significance linked to hypotrichosis 15 (rs764787339, rs2472299130, rs545208237) (Table 2).

Apart from articles on the involvement of C3orf52 on hair loss, PubMed and Google Scholar provided a couple of other potential linkages between this gene and diseases, specifically a variety of cancers. One of the more eye-catching articles found associations of this gene in the development of multifocal and multicentric breast cancer, and is looking into it as a current marker for distinguishing multifocal and multicentric breast cancer from unifocal breast cancers.

Another study proposes looking at C3orf52 as a potential marker as a prognosis gene of cancer in a study looking at DNA copy number variations, which are common in cancer cells. Additionally, C3orf52 is linked to be downregulated in clear-cell renal cell carcinoma, and its reduced expression was linked to later disease stage and poorer overall survival of clear-cell renal cell carcinoma patients.

=== Single nucleotide polymorphisms ===
| SNP | Position | Base change | AA change | Mutation type | Significance | Clinical significance |
| rs764787339 | bp 34 | G to A | Glu12Lys | Missense variant | Within the disordered region; also within the very beginning of the coding sequence | Hypotrichosis 15 |
| rs2472299130 | bp 438 - 442 | N/A | Thr148fs | Deletion | Directly following a string of highly conserved AA | Hypotrichosis 15 |
| rs545208237 | bp 492 | T to A | Tyr164Ter | Stop gained | On a conserved AA | Hypotrichosis 15 |
| rs16859190 | bp 331 | A to G | I to V | Missense variant | On a non-conserved AA within a string of conserved AA | None |
| | bp 430 | G to A | G to R | Missense variants | Within SEA domain | None |
| rs16859172 | bp 197 | T to C | L to P | Missense variant | On a conserved AA, within transmembrane region | None |
| rs111954756 | bp 566 | G to C | G to A | Missense variant | On a non-conserved AA within a string of conserved AA | None |
Table 3. Summary of common single-nucleotide polymorphism mutations within human C3orf52 including their position of occurrence and significance. Single nucleotide polymorphisms were found using variation viewer.

| SNP | Trait | Location |
| rs12053863-? | Glucose metabolism | 3:112100677 |
| rs79754744-G | Eosinophil count | 3:112111457 |
| rs76093951-T | Bilirubin measurement | 3:112117176 |
| rs7649379-? | Eosinophil count | 3:112121666 |
| rs1492488-C | Eosinophil percentage of leukocytes | 3:112122499 |
Table 4. Summary of GWAS catalog results. Majority of the results show immunity association within single nucleotide polymorphisms, particularly eosinophil count.
