= Small integral membrane protein 14 =

Small integral membrane protein 14, also known as SMIM14 or C4orf34, is a protein encoded on chromosome 4 of the human genome by the SMIM14 gene. SMIM14 has at least 298 orthologs mainly found in jawed vertebrates and no paralogs. SMIM14 is classified as a type I transmembrane protein. While this protein is not well understood by the scientific community, the transmembrane domain of SMIM14 may be involved in ER retention.

== Gene ==
The SMIM14 gene is located on the minus strand at cytogenetic band 4p14 and is 92,567 base pairs in length. The gene has five exons, four of which constitute the open-reading frame for SMIM14.

The Kozak sequence, which functions as the protein translation initiation site in most eukaryotic mRNA transcripts, is considered a strong motif. There is no signal peptide in SMIM14, but the encoded transmembrane domain acts as the signal sequence. It is predicted that one disulfide bridge is encoded in SMIM14, which stabilizes the tertiary (and sometimes quaternary) structures of proteins. There are at least ten polyadenylation sequences in the 3’ UTR of the SMIM14 gene, indicating transcription termination.

SMIM14 is expressed at four-times the level of an average gene.

== Gene regulation ==

=== Promoter ===
SMIM14 has seven predicted promoter regions. The promoter with the greatest number of transcripts and CAGE tags is approximately 1,420 base pairs in length. It is found on the minus strand and has a start position at residue 39,638,806 and ends at residue 39,640,225. The identified promoter has five coding transcripts and a maximum of 105,458 CAGE tags from one of the transcripts.
| Promoter ID | Start Position | End Position | Length (bp) | Coding Transcripts |
| GXP_150112 | 39,549,547 | 39,550,812 | 1,266 | 0 |
| GXP_3198013 | 39,583,919 | 39,584,958 | 1,040 | 0 |
| GXP_9520406 | 39,605,105 | 39,606,144 | 1,040 | N/A |
| GXP_9520407 | 39,626,490 | 39,627,529 | 1,040 | N/A |
| GXP_6750876 | 39,627,082 | 39,628,121 | 1,040 | 1 |
| GXP_3198015 | 39,638,191 | 39,639,230 | 1,040 | 0 |
| GXP_6750877 | 39,638,806 | 39,640,225 | 1,420 | 5 |
For the SMIM14 gene, the associated CpG sites are found in CpG island 76; additional transcription factors can bind to this promoter to drive SMIM14 gene expression.
| Literature-curated Transcription Factors |
| SMARCA4 |
| STAT1 |
| RBL2 |
| TRIM28 |
| EGR1 |
| TFAP2C |

== RNA and expression ==
SMIM14 has three mRNA transcript variants. Transcript variant 1 is the longest variant, with 6,397 base pairs.
| Transcript | Length (bp) | Accession number |
| Transcript variant 1 | 6,397 | NM_001317896.2 |
| Transcript variant 2 | 6,252 | NM_174921 |
| Transcript variant 3 | 6,263 | NM_001317897 |
SMIM14 has high expression in the liver, adrenal gland, colon, and prostate. It is under-expressed in peripheral blood lymphocytes, skeletal muscles, and the heart.

== Protein ==

From SMIM14, transcript variant 1, a protein of 99 amino acids is synthesized.

=== Primary structure ===
The predicted molecular weight (Mw) of the SMIM14 protein is 10710.34 Da. The SMIM14 protein carries no electrical charge at a pH value of 5.10 (i.e. isoelectric point, pI). The abundance of every amino acid is within the normal range for humans.

==== Transmembrane domain and motifs ====
The Kozak sequence is considered a strong motif.

SMIM14 has one transmembrane domain, so it is classified as a single-pass membrane protein. The transmembrane domain extends from residues 51–70. It is predicted that within the domain, there is a dileucine motif, which plays a role in the sorting of transmembrane proteins to endosomes and lysosomes. The N-terminus is positioned in the extracellular space, while the C-terminus is located inside the cell, further classifying SMIM14 as a type I transmembrane protein.

=== Secondary structure ===
It is predicted that there is an ɑ-helix within the transmembrane domain. It is also predicted that SMIM14 is randomly coiled near the C-terminus. A random coil is regarded as the protein's lack of a secondary structure, so it assumes a relaxed, non-interacting nor stabilizing conformation. It is also predicted that extended strands (E-strands) are throughout the protein. E-strands are a common secondary structure, as well, and are often characterized by their involvement in hydrogen bonding with polar side chains.

Within the N-terminus, SMIM14 is predicted to have three palmitoylation sites, which facilitates the clustering of proteins, and one disulfide bridge, stabilizing the structure of the protein. There is also a predicted glycosaminoglycan site spanning residues 45–48, proximal to the transmembrane domain. The C-terminus is predicted to have two unidentified phosphorylation sites and one PKA-phosphorylation site.

=== Subcellular location ===
SMIM14, a transmembrane protein, is usually expressed in the ER membrane. While there is no conventional ER retention signal within SMIM14 coding sequences, it has been suggested that the transmembrane domain mediates ER retention.

== Homology ==
SMIM14 has no known paralogs and at least 298 orthologs.

=== Paralogs ===
Through BLAST, it has been established that there are no paralogs of the SMIM14 gene in Homo sapiens.

=== Orthologs ===
SMIM14 is conserved in most vertebrates, excluding hagfish, lampreys, lobe-finned fish, and lungfish. For invertebrates, they are conserved in flatworms, roundworms, mollusks, and arthropods. It is also relatively conserved in distant relatives, such as sea anemones and corals.
| Species | Common name | Taxons | DoD (mya) | % Identity | % Similarity | Corrected % Divergence (m) | Accession number |
| Mastomys coucha | Southern multimammate mouse | rodentia | 90 | 87.9 | 98.0 | 12.9 | XP_031198284.1 |
| Phyllostomus discolor | pale spear-nosed bat | mammalia | 96 | 93.4 | 99.0 | 6.70 | XP_028361411.1 |
| Manacus vitellinus | golden-collared manakin | aves | 312 | 85.1 | 91.1 | 16.1 | XP_017923893.1 |
| Python bivittatus | Burmese python | reptilia | 312 | 80.2 | 89.1 | 22.1 | XP_007426519 |
| Nanorana parkeri | high Himalaya frog | amphibia | 352 | 69.2 | 79.8 | 36.8 | XP_018420132.1 |
| Danio rerio | zebrafish | actinopterygii | 435 | 68.0 | 82.5 | 38.6 | NP_991165.1 |
| Rhincodon typus | whale shark | chondrichthyes | 473 | 71.8 | 84.5 | 33.1 | XP_020383770.1 |
| Ciona intestinalis | sea vase | ascidiacea | 676 | 42.7 | 55.3 | 85.1 | XP_026690156.1 |
| Strongylocentrotus | Pacific purple sea urchin | echinodermata | 684 | 50.5 | 68.0 | 68.3 | XP_787363.2 |
| Lingula anatina | lamp shell | brachiopoda | 797 | 59.0 | 74.3 | 52.8 | XP_013382479.1 |
| Limulus polyphemus | Atlantic horseshoe crab | arthropoda | 797 | 49.5 | 65.0 | 70.3 | XP_013782563.1 |
| Agrilus planipennis | emerald ash borer | insecta | 797 | 39.8 | 57.3 | 92.1 | XP_018319678.1 |
| Octopus vulgaris | octopus | mollusca | 797 | 51.0 | 64.4 | 67.3 | XP_029637526.1 |
| Strongyloides ratti | threadworm | nematoda | 797 | 33.3 | 48.1 | 110 | XP_024504825.1 |
| Exaiptasia pallida | sea anemone | anthozoa | 824 | 58.2 | 65.5 | 54.1 | XP_020902189.1 |
| Schistosoma haematobium | urinary blood fluke | platyhelminthes | 824 | 37.4 | 53.3 | 98.3 | XP_012793134.1 |
The sequence of the SMIM14 gene is highly conserved in orthologs proximal to the N-terminus. In stark contrast, the C-terminus is more varied across orthologs. Sequence analysis of the SMIM14 gene in humans suggests that the C-terminus encodes a disproportionate amount of proline residues (9 out of 29; 31%) with several proline-rich sequences (PXXP). Proline-rich domains are usually associated with protein-protein interactions; thus, the C-terminus has a high probability of interacting with proteins.

== Protein interactions ==
SMIM14 has been predicted to interact with the FATE1 protein, which is involved in the Ca^{2+} transfer from the ER to mitochondria, a regulatory mechanism for apoptosis. It has also been predicted that SMIM14 interacts with LSM4, a glycine-rich protein that plays a role in pre-mRNA splicing.
