Superfamily (proteins)

From Wikipedia, the free encyclopedia
  (Redirected from SUPERFAMILY)
Jump to: navigation, search
Description The SUPERFAMILY database provides structural and functional annotation for all proteins and genomes.
Data types captured Protein families, genome annotation, alignments, Hidden Markov models (HMMs)
Organisms all
Research center University of Bristol
Primary citation PMID 19036790[1]
Data format FASTA format
Download URL
License GNU General Public License
Version 1.75

SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes.[1][2][3][4][5][6][7] SUPERFAMILY classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. [8][9] The superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

Annotations in SUPERFAMILY[edit]

The SUPERFAMILY annotation is based on a collection of hidden Markov models, which represent structural protein domains at the SCOP superfamily level.[10] A superfamily groups together domains which have an evolutionary relationship. The annotation is produced by scanning protein sequences from completely sequenced genomes against the hidden Markov models.

For each protein you can:

  • Submit sequences for SCOP classification
  • View domain organisation, sequence alignments and protein sequence details

For each genome you can:

  • Examine superfamily assignments, phylogenetic trees, domain organisation lists and networks
  • Check for over- and under-represented superfamilies within a genome

For each superfamily you can:

  • Inspect SCOP classification, functional annotation, Gene Ontology annotation,[11][5] InterPro abstract and genome assignments
  • Explore taxonomic distribution of a superfamily across the tree of life

All annotation, models and the database dump are freely available for download to everyone.


  1. ^ a b Wilson, D.; Pethica, R.; Zhou, Y.; Talbot, C.; Vogel, C.; Madera, M.; Chothia, C.; Gough, J. (2009). "SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny". Nucleic Acids Research 37 (Database issue): D380–D386. doi:10.1093/nar/gkn762. PMC 2686452. PMID 19036790. 
  2. ^ Wilson, D.; Madera, M.; Vogel, C.; Chothia, C.; Gough, J. (2007). "The SUPERFAMILY database in 2007: Families and functions". Nucleic Acids Research 35 (Database issue): D308–D313. doi:10.1093/nar/gkl910. PMC 1669749. PMID 17098927. 
  3. ^ Gough, J. (2002). "The SUPERFAMILY database in structural genomics". Acta crystallographica. Section D, Biological crystallography 58 (Pt 11): 1897–1900. doi:10.1107/s0907444902015160. PMID 12393919. 
  4. ^ Gough, J.; Chothia, C. (2002). "SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments". Nucleic Acids Research 30 (1): 268–272. doi:10.1093/nar/30.1.268. PMC 99153. PMID 11752312. 
  5. ^ a b De Lima Morais, D. A.; Fang, H.; Rackham, O. J. L.; Wilson, D.; Pethica, R.; Chothia, C.; Gough, J. (2010). "SUPERFAMILY 1.75 including a domain-centric gene ontology method". Nucleic Acids Research 39 (Database issue): D427–D434. doi:10.1093/nar/gkq1130. PMC 3013712. PMID 21062816. 
  6. ^ Superfamily on Twitter
  7. ^ Oates, M. E.; Stahlhacke, J; Vavoulis, D. V.; Smithers, B; Rackham, O. J.; Sardar, A. J.; Zaucha, J; Thurlby, N; Fang, H; Gough, J (2015). "The SUPERFAMILY 1.75 database in 2014: A doubling of data". Nucleic Acids Research 43 (Database issue): D227–33. doi:10.1093/nar/gku1041. PMID 25414345. 
  8. ^ Hubbard, T. J.; Ailey, B.; Brenner, S. E.; Murzin, A. G.; Chothia, C. (1999). "SCOP: A Structural Classification of Proteins database". Nucleic Acids Research 27 (1): 254–256. doi:10.1093/nar/27.1.254. PMC 148149. PMID 9847194. 
  9. ^ Lo Conte, L.; Ailey, B.; Hubbard, T. J.; Brenner, S. E.; Murzin, A. G.; Chothia, C. (2000). "SCOP: A Structural Classification of Proteins database". Nucleic Acids Research 28 (1): 257–259. doi:10.1093/nar/28.1.257. PMC 102479. PMID 10592240. 
  10. ^ Gough, J.; Karplus, K.; Hughey, R.; Chothia, C. (2001). "Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure1". Journal of Molecular Biology 313 (4): 903–919. doi:10.1006/jmbi.2001.5080. PMID 11697912. 
  11. ^ Botstein, D.; Cherry, J. M.; Ashburner, M.; Ball, C. A.; Blake, J. A.; Butler, H.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; Harris, M. A.; Hill, D. P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J. C.; Richardson, J. E.; Ringwald, M.; Rubin, G. M.; Sherlock, G. (2000). "Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium". Nature Genetics 25 (1): 25–29. doi:10.1038/75556. PMC 3037419. PMID 10802651.  open access publication - free to read