Jump to content

Draft:Coiled-coil domain containing 97: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Kkppz (talk | contribs)
No edit summary
Kkppz (talk | contribs)
No edit summary
Line 16: Line 16:
<u>Orthologs</u>
<u>Orthologs</u>


Orthologs for CCDC97 can be found in most vertebrates as well as invertebrates. 20 orthologs from NCBI<ref name="Orthologs">{{cite web |title=NCBI Orthologs |url=https://www.ncbi.nlm.nih.gov/gene/90324/ortholog/?scope=8292&term=CCDC97 |website=NCBI (National Center for Biotechnology Information)}}</ref> were collected and compared to CCDC97 that was found in humans by utilizing EMBOSS Needle<ref>{{cite web |title=EMBOSS Needle |url=https://www.ebi.ac.uk/jdispatcher/psa/emboss_needle |website=EMBOSS Needle (Pairwise Sequence Alignment (PSA))}}</ref> and TimeTree<ref>{{cite web |title=Timetree |url=https://timetree.org/ |website=Timetree (The Timescale of Life)}}</ref>. As the date of divergence increases, the sequence identity (%) decreases as expected going from mammals (89.8%-59.0%), reptiles (51.5%-51.1%), Aves (41.1%-37.7%), amphibians (48.1%-46.2%), fish (47.5%-40.7%), and invertebrates (28.2%). Aves were the only group that did not follow the trend, suggesting that this gene has greatly mutated in birds. It is also important to note that amphibians and fish had similar sequence identities with some fish having higher sequence identity (%) values than amphibians. Invertebrates are the most distantly related from humans so the low sequence identity for the Caenorhabditis elegans of 28.2% was anticipate. CCDC97 is highly conserved and is found in both vertebrate and invertebrate. It most likely appeared in invertebrates around 700 million years ago because those are the last known organisms where the protein is present.
Orthologs for CCDC97 can be found in most vertebrates as well as invertebrates. 20 orthologs from NCBI<ref name="Orthologs">{{cite web |title=NCBI Orthologs |url=https://www.ncbi.nlm.nih.gov/gene/90324/ortholog/?scope=8292&term=CCDC97 |website=NCBI (National Center for Biotechnology Information)}}</ref> were collected and compared to CCDC97 that was found in humans by utilizing EMBOSS Needle<ref>{{cite web |title=EMBOSS Needle |url=https://www.ebi.ac.uk/jdispatcher/psa/emboss_needle |website=EMBOSS Needle (Pairwise Sequence Alignment (PSA))}}</ref> and TimeTree<ref>{{cite web |title=Timetree |url=https://timetree.org/ |website=Timetree (The Timescale of Life)}}</ref>. As the date of divergence increases, the sequence identity (%) decreases as expected going from mammals (89.8%-59.0%), reptiles (51.5%-51.1%), Aves (41.1%-37.7%), amphibians (48.1%-46.2%), fish (47.5%-40.7%), and invertebrates (28.2%).
Aves were the only group that did not follow the trend, suggesting that this gene has greatly mutated in birds. It is also important to note that amphibians and fish had similar sequence identities with some fish having higher sequence identity (%) values than amphibians. Invertebrates are the most distantly related from humans so the low sequence identity for the Caenorhabditis elegans of 28.2% was anticipate. CCDC97 is highly conserved and is found in both vertebrate and invertebrate. It most likely appeared in invertebrates around 700 million years ago because those are the last known organisms where the protein is present.
{| class="wikitable"
{| class="wikitable"
|+
|+

Revision as of 17:56, 8 July 2024

CCDC97

Gene

Coiled-coil domain containing 97 or CCDC97[1], also known as FLJ40267 and MGC20255, is a protein coding gene located at 19q13.2 on the plus strand with 6 exons. Orthologs for this gene can be found in mammals, reptiles, amphibians, birds, fish, and invertebrates. Transcriptional variant 1[2] with 3329 base pairs encodes the longer protein isoform containing 343 amino acids. , The CCDC97 protein isoform 1[3] has a molecular mass of ~39000 Da[4].

Transcription and Protein

This CCD97 gene is expressed at high levels, 2.4 time more than the average gene, and transcription produces 5 different mRNAs, 3 alternatively spliced variants, 2 unsliced forms and contains 3 non-overlapping alternative last exons and 5 alternative polyadenylation sites.[5] 2 spliced and unspliced mRNA that are able to encode 4 good proteins resulting in 4 isoforms, 1 complete and 3 COOH complete, with some containing the Coiled-coil domain containing protein (DUF2052).[5]

Evolution

Paralogs

No paralogs for CCDC97 were found on NCBI[1] with and without the use of BLAST[6].

Corrected Protein Sequence Divergence is plotted against Date of Divergence (MYA) to generate a graphical representation of the mutation rate of CCDC97 compared to a gene that is known to change slowly (cytochrome c) and a gene known to mutate quickly (Fibrinogen alpha).

Orthologs

Orthologs for CCDC97 can be found in most vertebrates as well as invertebrates. 20 orthologs from NCBI[7] were collected and compared to CCDC97 that was found in humans by utilizing EMBOSS Needle[8] and TimeTree[9]. As the date of divergence increases, the sequence identity (%) decreases as expected going from mammals (89.8%-59.0%), reptiles (51.5%-51.1%), Aves (41.1%-37.7%), amphibians (48.1%-46.2%), fish (47.5%-40.7%), and invertebrates (28.2%).

Aves were the only group that did not follow the trend, suggesting that this gene has greatly mutated in birds. It is also important to note that amphibians and fish had similar sequence identities with some fish having higher sequence identity (%) values than amphibians. Invertebrates are the most distantly related from humans so the low sequence identity for the Caenorhabditis elegans of 28.2% was anticipate. CCDC97 is highly conserved and is found in both vertebrate and invertebrate. It most likely appeared in invertebrates around 700 million years ago because those are the last known organisms where the protein is present.

CCDC97 Genus and Species Common Name Taxanomic Group Median Date of Divergance (MYA) Accession Number Sequence Length (aa) Sequence Identity (%) Sequence Similarity (%)
Mammals Homo sapiens Humans Primates 0 NM_052848 343 100% 100%
Cavia porcellus Domestic Guinea Pig Rodentia 87 XP_003462073 342 89.80% 94.20%
Physeter catodon Sperm Whale Cetartiodactyla 94 XP_007128179 347 88.80% 91.40%
Artibeus jamaicensis Jamaican Fruit Bat Chiroptera 94 XP_037013554 361 84.80% 87.30%
Sarcophilus harrisii Tasmanian Devil Dasyuromorphia 160 XP_031819750 332 64.40% 75.60%
Tachyglossus aculeatus Australian echidna Monotremata 180 XP_038623271 330 59.00% 68.10%
Reptlia Python bivittatus Burmese Python Squamata 319 XP_007421554 345 51.50% 62.90%
Alligator mississippiensis American Alligator Crocodilia 319 XP_059574710 309 51.30% 63.00%
Varanus komodoensis Komodo Dragon Squamata 319 XP_044291280 387 51.10% 60.20%
Aves Accipiter gentilis Northern goshawk Cuculiformes 319 XP_049652563 303 41.10% 50.30%
Phalacrocorax carbo Great Cormorant Suliformes 319 XP_064296149 317 37.70% 46.30%
Amphibian Xenopus tropicalis Tropical clawed frog Anura 325 XP_012823864 300 46.20% 61.90%
Rhinatrema bivittatum Rhinatrema bivittatum Gymnophiona 352 XP_029475649 308 48.70% 63.90%
Microcaecilia unicolor Microcaecilia unicolor Gymnophiona 352 XP_030075449 315 47.10% 61.40%
Fish Protopterus annectens West African Lungfish Lepidosireniformes 408 XP_043933492 354 45.20% 60.20%
Latimeria chalumnae Coelacanth Coelacanthiformes 415 XP_014349074 339 47.50% 63.70%
Acipenser ruthenus Sterlet Acipenseriformes 429 XP_033881880 363 46.30% 57.90%
Leucoraja erinacea Little Skate Rajiformes 462 XP_055519601 344 46.10% 63.30%
Callorhinchus milii Elephant Shark Chimaeriformes 462 XP_007909130 326 45.00% 61.20%
Petromyzon marinus Sea Lamprey Petromyzontiformes 563 XP_032821086 314 40.70% 57.10%
Invertebrate Caenorhabditis elegans Caenorhabditis elegans Rhabditida 708 NP_506468 301 28.20% 45.50%

Promoter

Table 2: CCDC97 transcription factor key[10]
Name Class Family
KLF3 C2H2 zinc finger factors Three-zinc finger Kruppel-related
ZNF454 C2H2 zinc finger factors More than 3 adjacent zinc fingers
Thap11 C2CH THAP-type zinc finger factors THAP-related factors
SOX14 High-mobility group (HMG) domain factors SOX-related factors
PKNOX1 Homeo domain factors TALE-type homeo domain factors
ZNF530 C2H2 zinc finger factors More than 3 adjacent zinc fingers
Nrf1 Basic leucine zipper factors (bZIP) Jun-related
ZNF213 C2H2 zinc finger factors More than 3 adjacent zinc fingers


Secondary Structures

CCDC97 annotated 5' UTR
CCDC97 annotated 3’ UTR with miRNA (Black boxes) and RBPDB (Red circles)
CCDC97 annotated 3' UTR
Table 3: Top scoring microRNA found for CCDC97[11]
Name Score Sequence
hsa-miR-486-3p 99 ctgcccca
hsa-miR-30a-5p 99 tgtttaca
hsa-miR-8085 98 ctctccc
hsa-miR-4524a-3p 97 ctgtctc
hsa-miR-450a-2-3p 92 tccccaa
Table 4: Top scoring RBPDB found for CCDC97[12]
Name Score Sequence
A2BP1 11.1 UGCAUG
HNRNPA1 9.9 UAGGGA
NONO 8.9 AGGGA

References

  1. ^ a b "Homo sapiens coiled-coil domain containing 97, mRNA (cDNA clone MGC:20255 IMAGE:4651484), complete cds". NCBI - Nucleotide (National Center for Biotechnology Information). Cite error: The named reference "NCBI1" was defined multiple times with different content (see the help page).
  2. ^ "Homo sapiens coiled-coil domain containing 97 (CCDC97), transcript variant 1, mRNA". NCBI - Nucleotide (National Center for Biotechnology Information).
  3. ^ "coiled-coil domain-containing protein 97 isoform 1 [Homo sapiens]". NCBI - Protein (National Center for Biotechnology Information).
  4. ^ "CCDC97 Gene - Coiled-Coil Domain Containing 97". GeneCard.
  5. ^ a b "Homo sapiens gene CCDC97, encoding coiled-coil domain containing 97". AceView. Cite error: The named reference "AceView" was defined multiple times with different content (see the help page).
  6. ^ "Basic Local Alignment Search Tool". NCBI BLAST.
  7. ^ "NCBI Orthologs". NCBI (National Center for Biotechnology Information).
  8. ^ "EMBOSS Needle". EMBOSS Needle (Pairwise Sequence Alignment (PSA)).
  9. ^ "Timetree". Timetree (The Timescale of Life).
  10. ^ "JASPER entry on CCDC97". JASPER2024.
  11. ^ "CCDC97 miRNA". miRBD.
  12. ^ "CCDC97 RBPDB". RBPDB.