= FAM200B =

FAM200B (Family with sequence similarity 200 member B), is a protein which in humans is encoded by the FAM200B gene.The gene encodes a 657 amino acid protein. The FAM200B protein is a large intracellular protein with no well defined functional domains. Structural data states there is no experimentally proven structures available, however predicted tertiary structures are available. Expression data states FAM200B is expressed moderately and ubiquitously in all tissues, with relatively higher expression in the brain and thymus. Although its function remains unknown, predicted nuclear localization and expression patterns suggest that FAM200B may be involved in regulatory processes such as gene expression or protein protein interactions.

== Gene ==

FAM200B also known as C4orf54, is a protein coding gene located on chromosome 4p15.32. The gene spans 4,287 nucleotides and contains two exons. This gene has multiple transcript variants that encode for two protein isoforms.

== Transcripts ==
The canonical FAM200B transcript is NM_001145191.2, which spans approximately 4.3 kb and consists of two exons with the second exon (bases 76–4287) comprising nearly the entire coding sequence. There are multiple transcript variants for FAM200B, that encodes for two protein isoforms. See table 1.

Table 1: Transcript and protein isoforms of the human FAM200B gene.
| Transcript | Length (nt) | Protein | Length (aa) | Isoform |
| NM_001145191.2 | 4,287 | NP_001138663.1 | 657 | MANE Select |
| XM_017008048.2 | 4,397 | XP_016863537.1 | 657 | X1 |
| XM_024453999.2 | 3,822 | XP_024309767.1 | 657 | X1 |
| XM_024454000.2 | 3,818 | XP_024309768.1 | 657 | X1 |
| XM_024454001.2 | 3,942 | XP_024309769.1 | 657 | X1 |
| XM_024454003.2 | 3,938 | XP_024309771.1 | 657 | X1 |
| XM_024454005.2 | 3,803 | XP_024309773.1 | 657 | X1 |
| XM_024454006.2 | 3,799 | XP_024309774.1 | 657 | X1 |
| XM_024454008.2 | 3,749 | XP_024309776.1 | 480 | X2 |
| XM_024454009.2 | 3,869 | XP_024309777.1 | 480 | X2 |
| XM_024454010.2 | 3,979 | XP_024309778.1 | 480 | X2 |
| XM_024454011.2 | 3,730 | XP_024309779.1 | 480 | X2 |
| XM_047450103.1 | 4,464 | XP_047306059.1 | 657 | X1 |
| XM_047450104.1 | 4,517 | XP_047306060.1 | 657 | X1 |
| XM_047450106.1 | 4,342 | XP_047306062.1 | 657 | X1 |
| XM_047450107.1 | 4,378 | XP_047306063.1 | 657 | X1 |
| XM_047450108.1 | 3,889 | XP_047306064.1 | 657 | X1 |
| XM_047450109.1 | 3,885 | XP_047306065.1 | 657 | X1 |
| XM_047450110.1 | 4,480 | XP_047306066.1 | 657 | X1 |
| XM_047450112.1 | 4,840 | XP_047306068.1 | 657 | X1 |
| XM_047450113.1 | 3,816 | XP_047306069.1 | 480 | X2 |
| XM_047450114.1 | 3,926 | XP_047306070.1 | 480 | X2 |
| XM_047450115.1 | 3,859 | XP_047306071.1 | 480 | X2 |
| XM_047450117.1 | 3,804 | XP_047306073.1 | 480 | X2 |
| XM_054349762.1 | 4,559 | XP_054205737.1 | 657 | X1 |
| XM_054349763.1 | 4,612 | XP_054205738.1 | 657 | X1 |
| XM_054349764.1 | 4,394 | XP_054205739.1 | 657 | X1 |
| XM_054349765.1 | 4,339 | XP_054205740.1 | 657 | X1 |
| XM_054349766.1 | 4,375 | XP_054205741.1 | 657 | X1 |
| XM_054349767.1 | 3,819 | XP_054205742.1 | 657 | X1 |
| XM_054349768.1 | 3,815 | XP_054205743.1 | 657 | X1 |
| XM_054349769.1 | 3,984 | XP_054205744.1 | 657 | X1 |
| XM_054349770.1 | 3,980 | XP_054205745.1 | 657 | X1 |
| XM_054349771.1 | 4,037 | XP_054205746.1 | 657 | X1 |
| XM_054349772.1 | 4,033 | XP_054205747.1 | 657 | X1 |
| XM_054349773.1 | 4,477 | XP_054205748.1 | 657 | X1 |
| XM_054349774.1 | 4,837 | XP_054205749.1 | 657 | X1 |
| XM_054349775.1 | 3,800 | XP_054205750.1 | 657 | X1 |
| XM_054349776.1 | 3,796 | XP_054205751.1 | 657 | X1 |
| XM_054349777.1 | 3,746 | XP_054205752.1 | 480 | X2 |
| XM_054349778.1 | 3,911 | XP_054205753.1 | 480 | X2 |
| XM_054349779.1 | 3,964 | XP_054205754.1 | 480 | X2 |
| XM_054349780.1 | 4,074 | XP_054205755.1 | 480 | X2 |
| XM_054349781.1 | 4,021 | XP_054205756.1 | 480 | X2 |
| XM_054349782.1 | 3,856 | XP_054205757.1 | 480 | X2 |
| XM_054349783.1 | 3,727 | XP_054205758.1 | 480 | X2 |
| XM_054349784.1 | 3,801 | XP_054205759.1 | 480 | X2 |

== Protein ==
Human FAM200B encodes two protein isoforms, a longer 657 amino acid isoform (Isoform X1) and a shorter 480 amino acid isoform (Isoform X2), see Table 1. The canonical transcript is NM_001145191.2, while the remaining variants are predicted models. The predicted molecular weight is ~76.0 kDa and approximate pI is 8.33. Amino acid composition is enriched for Leu (~12.9%), Ser (~8.8%), Lys (~8.2%), and Glu (~8.1%). FAM200B lacks low complexity regions, long tandem repeats, or significant charge clusters, with charged residues distributed evenly throughout the sequence. Only short, localized periodic motifs were detected, consistent with a soluble intracellular protein lacking large repetitive domains. Although no experimentally validated domains have been defined, based on homology analyses across vertebrate ortho logs conserved C2H2- and BED type zinc finger motifs were identified in FAM200B. Secondary structure analysis predicts FAM200B is a mixture of α-helices and β-strands, concentrated in a conserved central region of the protein. Predicted tertiary structure suggest that FAM200B contains a mostly globular fold with a well structured core and more flexible N and C terminal regions.

== Gene level regulation ==
FAM200B is ubiquitously expressed moderately across human tissues and relatively higher expression in brain and thymus. Promoter analysis identified ETC and ETV5::FOXJ1 motif as high scoring transcription factor binding sites (with high scores of 511 and 436). Both transcription factors are known to function in neural development and brain related regulatory pathways, making them biologically plausible given the higher expression of FAM200B in brain tissue.

== Protein level regulation ==
FAM200B is a nuclear, soluble protein, with no signal peptide or transmembrane domains and no evidence of secretion or membrane insertion. Post translational modification predictions indicate multiple serine, threonine, and tyrosine phosphorylation sites and multiple SUMOylation sites, while no evidence for lipid anchor attachment, relevant glycosylation, or N terminal acetylation was identified.

== Homology ==
FAM200B is a vertebrate specific gene with the conserved paralog FAM200A, indicating a stable gene family structure across evolution. The two human para logs FAM200B and FAM200A have 79.79% sequence identity and both contain the conserved Domain of Unknown Function 4371 (DUF4371), supporting common evolutionary origin and functional similarity. Comparative genome analysis shows that FAM200B ortho logs are present throughout vertebrates, including mammals, birds, amphibians, and bony fishes, with no clear homologs detected in invertebrate lineages, suggesting emergence during early vertebrate evolution.

The earliest identifiable FAM200B ortho logs occur in A. ctinopterygii (ray-finned fish), indicating the gene originated prior to the divergence of bony fish and tetrapods approximately 420 - 450 million years ago. Across vertebrates, the number of family members has remained stable at two paralogs, although there are moderate differences observed in transcript length, exon composition, and alternative splicing patterns in distant orthologs. Despite this divergence, the overall sequence and conserved DUF4371 core are maintained.

Table 2: 20 orthologs of the FAM200B protein in organisms including mammals (34-100% identity), birds/ reptiles (25-37% identity), amphibians (35-38% identity) and bony fish (37-43% identity).
| | Clade | Genus, Species | Common Name | Taxonomic Group | Divergence Date (MYA) | Accession Number | Query Cover | Sequnce Length (aa) | Sequence Identity (%) | Sequence Similarity (%) |
| | Mammalia | Homo Sapiens | Human | Primates | 0 | NP_001138663.1 | 100 | 657 | 100 | 100 |
| 1 | | Pan troglodytes | Chimpanzee | Apes | 6.4 | XP_001139775.1 | 100 | 573 | 99 | 99 |
| 2 | | Papio anubis | Olive baboon | Primates | 28.8 | XP_017814067.1 | 100 | 657 | 97 | 98 |
| 3 | | Canis lupus familiaris | Dog | Carnivora | 94 | XP_038335570.0 | 88 | 813 | 91 | 95 |
| 4 | | Monodelphis domestica | Gray short-tailed opossum | Marsupials | 160 | XP_056673701.1 | 75 | 748 | 34 | 54 |
| 5 | Reptilia | Natator depressus | Flatback sea turtle | Testudines | 319 | XP_074809886.1 | 91 | 655 | 34 | 56 |
| 8 | | Chelonia mydas | Green sea turtle | Testudines | 319 | XP_043379535.1 | 98 | 624 | 34 | 57 |
| 6 | Aves | Oxyura jamaicensis | Ruddy duck | Aves | 319 | XP_035169477.1 | 87 | 564 | 27 | 47 |
| 7 | | Caloenas nicobarica | Nicobar pigeon | Aves | 319 | XP_065484009.1 | 89 | 604 | 25 | 44 |
| 9 | Amphibia | Pleurodeles waltl | Iberian ribbed newt | Urodela | 352 | XP_069075336.1 | 92 | 617 | 38 | 59 |
| 10 | | Dendrobates tinctorius | Poison dart frog | Anura | 352 | XP_073431629.1 | 89 | 625 | 38 | 59 |
| 11 | | Ascaphus truei | Tailed frog | Anura | 352 | XP_075472991.1 | 100 | 638 | 37 | 57 |
| 12 | | Rhinatrema bivittatum | | Gymnophiona | 352 | XP_029452623.1 | 92 | 598 | 35 | 56 |
| 13 | Osteichthyes | Trichomycterus rosablanca | Cave catfish | Siluriformes | 426 | XP_062844886.1 | 91 | 614 | 43 | 64 |
| 14 | | Astyanax mexicanus | Mexican tetra | Characiformes | 429 | XP_049334409.1 | 91 | 598 | 42 | 64 |
| 15 | | Anoplopoma fimbria | Sablefish | Scorpaeniformes | 429 | XP_054473507.1 | 85 | 552 | 42 | 64 |
| 16 | | Eleginops maclovinus | Patagonian bennie | Perciformes | 429 | : XP_063763934.1 | 99 | 692 | 38 | 59 |
| 17 | | Centroberyx gerrardi | Bright redfish | Eryciformes | 429 | XP_071783535.1 | 91 | 544 | 38 | 58 |
| 18 | | Carassius auratus | Goldfish | Cypriniformes | 429 | XP_026126532.1 | 95 | 633 | 37 | 58 |
| 19 | | Triplophysa rosa | | Cypriniformes | 429 | XP_057204124.1 | 94 | 633 | 39 | 59 |
| 20 | | Megalobrama amblycephala | Wuchang bream | Cypriniformes | 429 | XP_048064547.1 | 85 | 551 | 42 | 63 |

== Function ==
FAM200B encodes a conserved intracellular protein with an unknown function. Sequence and structural analyses indicate that FAM200B lacks catalytic motifs, signal peptides, and transmembrane domains. This suggests it does not function as an enzyme, secreted factor, or membrane protein. It's predicted nuclear localization, the presence of regulatory post translational modification sites (including phosphorylation and SUMOylation), and limited zinc finger like motifs support a role in regulatory processes, possibly involving protein protein interactions.

== Interacting proteins ==

Interaction analysis identified limited biologically plausible binding partners for FAM200B, most notably ANKRD45 and C1orf198. ANKRD45 contains ankyrin repeat domains that mediate protein protein interactions, supporting a role for FAM200B within regulatory complexes, while C1orf198 is an uncharacterized protein associated with nuclear and regulatory proteins, suggesting a protein complex relationship. Other predicted partners lack compatible localization or functional context. This indicates that FAM200B likely interacts as a nuclear regulatory protein that functions through protein - protein interactions within a protein complex.

== Clinical significance ==
FAM200B has no established association with human disease, and no pathogenic variants. However expression under specific cellular stressors, suggests that FAM200B may function as a modifier gene influencing strength, timing, or cellular context of disease related pathways.
