SNED1

SNED1 is a human protein expressed at low levels in a wide range of tissues. The protein is soluble and found in circulating blood and the conceptually translated protein has four domains of interest. These domains include a nidgen (NIDO) domain, three fibronectin type III (FN3) domains, several calcium binding EGF-like domains (EGF CA), and one complement control protein (CCP) domain. The gene is found on chromosome 2, locus q37.3. The mRNA was isolated from the spleen and is 6834bp in length. The conceptually translated protein is 1178aa long. This protein is predicted to interact with somatostatin, spermidine synthase and TMEM132C.^[1]

Gene

Locus

SNED1 is located on the plus strand of chromosome 2 at locus 2q37.3. The Refseq identification number is NM_001080437.1 The genomic DNA sequence of SNED1 contains 96,729bp and the longest spliced mRNA as predicted by AceView is 7048bp and contains 31 exons. There are 9 splice variants of SNED1 that exhibited protein structure matches using the Phyre 2 database which is discussed under "Tertiary and Quaternary Structure".^[2]

Common aliases

SNED1 is an acronym for "sushi, nidogen, and EGF-like domains". Aliases for SNED1 include Snep, SST3, and IRE-BP1.^[2]

Homology/evolution

Homologs and phylogeny

SNED1 is very highly conserved throughout evolutionary history and is shown to exhibit this conservation across a wide range of taxa, from mammals, to vertebrates, to invertebrates.It may be worth noting that the abundance of cysteine residues appear to be very highly conserved, suggesting that the cysteine richness is a very important feature of this protein.^[3]

Paralogs

SNED1 has a number of paralogs within the human genome, which cover small portions of the entire peptide sequence. There was no BLAST result that provided a hit that covered 100% of the query. Most hits fell in the 50-70% of query coverage and Max identity did not exceed 65%. Endogenous genes that are similar to the conserved domains in SNED 1 include; neurogenic locus notch homolog isoforms, protein-jagged precursors, protein eyes shut homolog isoforms, protein crumbs homolog isoforms,delta and notch-like epidermal growth factor receptor, sushi von Wilebrand factor A, and slit homolog 3 protein.

Protein

Primary sequence

The peptide sequence with the longest ORF was found by creating a conceptual translation using the SIXFRAME tool at the SDSC biology workbench website and this was the sequence used for most analyses. The full sequence obtained by an ncbi BLAST search can be accessed with the reference ID NP_001073906.1. One presumably important feature of this protein that is worth noting is that it is extraordinarily cysteine rich, with 106 cysteines total, giving an overall cysteine composition of 9.0%.^[4]

Domains and motifs

There are various interesting domains in this protein. The first in the annotated sequence above shown in pink, is an extracellular domain of unknown function within the Nidogen-1 domain (NIDO), also known as Entactin. The second regions of interest shown by an underline are calcium-binding EGF domain (EGF-CA). There are many of these domains in the sequence and they are often present in a large number of membrane bound and extracellular proteins. These EGF-CA domains may suggest a "sticky" nature to this protein as oftentimes extracellular matrix (ECM) proteins require calcium cations to form homo and heterodimeric complexes between other ECM proteins. The complement control protein (CCP) motif is annotated in green in the figure and this domain has been identified in many proteins involved in the compliment system. Other aliases for this domain include short consensus repeats (SCRs) and the Sushi domain, from which the protein gets its name. The Fibronectin type III domain (FN3) is annotated in blue and the presence of this domain may suggest one of the properties of this protein as being involved in cell adhesion. This FN3 domain contains internal repeats that are present in the plasma protein fibronectin. This particular domain contains the RGD sequence important in the binding of ECM proteins to integrins found in cell membranes, an important feature in cellular adhesion.

Post-translational modifications

There was only a few post-translational kinase dependant phosphorylation sites worth noting that resulted in a score of >0.8 by the NetPhosK program in the ExPASy Bioinformatics suite proteomics tools. These sites are annotated with yellow highlight in the conceptual translation above. All of these sites are predicted to be phosphorylated by either Protein kinase A(PKA) or Protein kinase C(PKC).

Secondary structure

The amino acid sequence of the longest variant is incredibly cysteine rich, presumably resulting in a large amount of di-sulfide bond formation. There is not an organized profound string of alpha helices, but there is a cluster of alpha helices toward the C-terminus. The beta sheets are annotated as purple text in the conceptual translation and the alpha-helices are annotated as red text.

Tertiary and quaternary structure

The program Phyre2 was used to construct predictions of both the conserved domain regions NIDO, CCP, and FN3, as well as each of the splice variants. There were some interesting results consistent with the proposed function of an extracellular "sticky" protein possibly involved in cell-cell adhesion or in clotting. Protein matches found in Phyre2 comprise an array of proteins with functions of; clotting, hydrolysis, plasminogen activation, hormone/growth factor, protein binding, cell-adhesion, and ECM proteins.

Splice variants a, b, and e, in Figures 5 and 6 have >99% structural similarity to the protein neurexin 1-alpha(NRXN1). Neurexins are cell adhesion molecules and often contain EGF binding domains, enhancing intracellular junction forming between cells. NRXN1 is also proposed to play a role in angiogenesis. Alpha-neurexins interact with neurexophilins and possibly function in the synaptic junctions of the vertebrate nervous system. Alpha neurexins often utilize alternate promoters and splice sites, resulting in many different transcripts from one gene, may be an explanation of this gene's abundance of alternative transcripts.

Splice variant d has a 100% structural match to Low density lipoprotein receptor-related protein 4 (LRP4). This protein is involved in SOST-mediated bone formation inhibition and inhibition of Wnt signaling. LRP4 plays an important role in the formation of neuromuscular junctions.

Splice variants f and g have >99% similarity to fibrillin-1, an ECM protein that is a structural component of calcium binding microfibrils.

Splice variant i and conserved domain CCP are >99% structurally similar to t-plasminogen activator(PLAT). PLAT is secreted by vascular endothelial cells and acts as a serine protease that converts plasminogen to plasmin. Plasmin is a fibrolytic enzyme that aids in the breakdown of blood clots and is used clinically for that exact purpose.

The conserved domain NIDO, was >99% similar to coagulation factor IX, also known as Factor IX (F9). F9 is a secreted coagulation factor involved in the clotting cascade that required activation by multiple other coagulation factors within the cascade.

The 3 consecutive conserved FN3 domains together are >100% similar with 100% coverage to anosmin 1. Anosmin-1 is an ECM glycoprotein responsible for normal neural development of the brain, spinal chord and kidney.

Interacting proteins

The STRING-Known and Predicted Protein Interaction database was used to determine proteins that may be interacting and the following proteins were candidates for interaction: somatostatin (SST), somatostatin receptor 2 (SSTR2)as well as a variety of other somatostatin receptors,^[5] spermine synthase (SMS), and TMEM132C. All of the somatostatin related proteins are involved in the inhibition of hormones. There is very little known about TMEM132C and all publications related to the protein are mass genome screens. The protein expression profiles of TMEM132C and SNED1 are very similar to SNED1, with protein abundance found in blood plasma, platelets, and liver. All of the interacting proteins described are expressed in these three common areas.

Expression

SNED1 is ubiquitously expressed at intermediate levels, making it unclear from RNA expression profiles, which cells are secreting SNED1. The protein expression profiles of SNED1 predicted with MOPED-Multi-Omics Profiling Expression Database and PaxB-Protein Abundance Across Organisms database indicate that the protein is found in blood serum, blood plasma, blood T-lymphocytes, platelets, kidney Hek-293 cells, liver, and low levels in the brain.

Transcript variants

The program Aceview was used to predict transcript variants, shown in Figure 6. There are 9 spliced forms and 3 unspliced forms. Three of the transcript variants, b, c, and e, contain green regions that represent uORFswhich indicate that they contain regulatory elements within the coding region of the transcript. All of the spliced transcript variants a-i were analyzed with the Phyre2 server to predict protein structure. See, "Tertiary and Quaternary Structure".

Promoter

The promoter was predicted and analyzed for transcription factor binding sites using the ElDorado software on the Genomatix software suite. There were alternative promoters downstream of the selected 845bp promoter.

Transcription factors

The following transcription factors were found with a matrix similarity of 1.00 and the entire binding domain was matched in the ElDorado predicted promoter.

Matrix Family	Detailed Family Information	Matrix	Detailed Matrix information	Strand	Matrix similarity	Sequence
BRAC	Brachyury gene, mesoderm developmental factor	TBX20.01	T-box transcription factor TBX20	(-)	1.00	gcatcgcggAGGTgtgcgggcgg
TF2B	RNA polymerase II transcription factor II B	BRE.01	Transcription factor II B (TFIIB) recognition element	(-/+)	1.00	ccgCGCC
XCPE	Activator-, mediator-, and TBO-dependent core promoter element for RNA polymerase II transcription from TATA-less promoter	XCPE1.01	X gene core promter element 1	(-)	1.00	ggGCGGgaccg
ZF02	C2H2 zinc finger transcription factors 2	ZKSCAN3.01	Zinc finger with KRAB and SCAN domains 3	(+)	1.00	catggCCCCaccacagggcgcgc
SP1F	GC-Box factors SP1/GC	SP1.03	Stimulating protein 1, ubiquitous zinc finger transcription factor	(-)	1.00	cggggGGGCggggccat
PLAG	Pleomorphic adeoma gene	PLAG1.02	Pleomorphic adeoma gene 1	(+)	1.00	aaGGGGgcagcacggaacgggtt

Proposed function

Based on the analysis described above, particularly in the Tertiary and Quaternary Structure section, deductions can be made about the role of SNED1. SNED1 is a secreted protein that circulates in the blood plasma and may be a component of the liver ECM. There are highly conserved structural similarities to other proteins involved in clotting and cell adhesion. One domain in particular contains a high structural similarity to fibrillin containing an RGD sequence, which is known to be involved in adhesion to membrane bound integrins. The structural similarities to nuerexin-1 also solidify the cell adhesion properties of this protein and are thought to be involved in angiogenesis, making the location of SNED1 in the blood circulation a convenient area for promoting new vessel formation. This protein likely has the property of "stickyness" in which it is able to readily bind to matrix components or other cells. Because of where this is protein is found, in circulating blood, and its structural similarities to proteins involved in the clotting cascade as well as cell adhesion and its location in circulation, SNED1 may be involved in injury by adhering to the sites of vessel damage and interacting with the newly forming clot, possibly signalling to promote new blood vessel formation at the damage site. This information is purely speculative and will require scientific support of the prediction.

Clinical significance

A select cases on NCBI's GeoProfiles highlighted some clinically relevant expression data regarding SNED1 expression levels in response to certain conditions. In aldosterone producing adenoma versus control lung tissue, SNED1 expression decreased about 25 fold in the adenoma tissue. In a development study on the transition from oligodendrocyte precursors to mature oligodendrocytes, expression decreased almost 100 fold upon differentiation into mature oligodendrocytes. It may be interesting to explore the expression in clotting disorders or other blood related diseases.

References

^ Leimeister, C; Schumacher N; Diez H; Gessler M (June 2004). "Cloning and expression analysis of the mouse stroma marker Snep encoding a novel nidogen domain protein". Developmental Dynamics. 230 (2): 371–377. doi:10.1002/dvdy.20056. PMID 15162516. Retrieved 05.02.13. {{cite journal}}: Check date values in: |accessdate= (help)
^ ^a ^b "GeneCards". Weizmann Institute of Science. Retrieved 2013-05-13.
^ Thompson, Julie D.; Higgins, D.G.; Gibson, T.J. (11 November 1994). "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice". Nucleic Acids Research. 22 (22): 4673–4680. doi:10.1093/nar/22.22.4673. PMC 308517. PMID 7984417.
^ Brendel, V. "Methods and algorithms for statistical analysis of protein sequences". Board of Trustees of the University of Illinois. Retrieved 2013-05-13.
^ Hannon, JP; Nunn C; Stolz B; Bruns C; Weckbecker G; Lewis I; Troxler T; Hurth K; Hoyer D (Feb–Apr 2002). "Drug design at peptide receptors: somatostatin receptor ligands". Journal of Molecular Neuroscience. 18 (1–2): 15–27. doi:10.1385/JMN:18:1-2:15. PMID 11931345. Retrieved 2013-05-13.

[Leimeister_C,_et.al-1] Leimeister, C; Schumacher N; Diez H; Gessler M (June 2004). "Cloning and expression analysis of the mouse stroma marker Snep encoding a novel nidogen domain protein". Developmental Dynamics. 230 (2): 371–377. doi:10.1002/dvdy.20056. PMID 15162516. Retrieved 05.02.13. {{cite journal}}: Check date values in: |accessdate= (help)

[SNED1_Genecards-2] "GeneCards". Weizmann Institute of Science. Retrieved 2013-05-13.

[3] Thompson, Julie D.; Higgins, D.G.; Gibson, T.J. (11 November 1994). "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice". Nucleic Acids Research. 22 (22): 4673–4680. doi:10.1093/nar/22.22.4673. PMC 308517. PMID 7984417.

[SAPS_analysis-4] Brendel, V. "Methods and algorithms for statistical analysis of protein sequences". Board of Trustees of the University of Illinois. Retrieved 2013-05-13.

[5] Hannon, JP; Nunn C; Stolz B; Bruns C; Weckbecker G; Lewis I; Troxler T; Hurth K; Hoyer D (Feb–Apr 2002). "Drug design at peptide receptors: somatostatin receptor ligands". Journal of Molecular Neuroscience. 18 (1–2): 15–27. doi:10.1385/JMN:18:1-2:15. PMID 11931345. Retrieved 2013-05-13.

[1]

[2]

[3]

[4]

[5]