Superfamily (proteins)

From Wikipedia, the free encyclopedia
  (Redirected from SUPERFAMILY)
Jump to: navigation, search
Description The SUPERFAMILY database provides structural and functional annotation for all proteins and genomes.
Data types captured Protein families, genome annotation, alignments, HMMs
Organisms all
Research center University of Bristol
Laboratory Julian Gough
Primary citation PMID 19036790
Data format FASTA format
Website SUPERFAMILY web site
Download URL SUPERFAMILY downloads
License GNU General Public License
Version 1.75

SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes.[1][2][3][4][5]

The SUPERFAMILY annotation is based on a collection of hidden Markov models, which represent structural protein domains at the SCOP superfamily level.[6] A superfamily groups together domains which have an evolutionary relationship. The annotation is produced by scanning protein sequences from completely sequenced genomes against the hidden Markov models.

For each protein you can:

  • Submit sequences for SCOP classification
  • View domain organisation, sequence alignments and protein sequence details

For each genome you can:

  • Examine superfamily assignments, phylogenetic trees, domain organisation lists and networks
  • Check for over- and under-represented superfamilies within a genome

For each superfamily you can:

  • Inspect SCOP classification, functional annotation, Gene Ontology annotation, InterPro abstract and genome assignments
  • Explore taxonomic distribution of a superfamily across the tree of life

All annotation, models and the database dump are freely available for download to everyone.


SUPERFAMILY classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. The superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

See also[edit]


  1. ^ Wilson, D.; Pethica, R.; Zhou, Y.; Talbot, C.; Vogel, C.; Madera, M.; Chothia, C.; Gough, J. (2009). "SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny". Nucleic Acids Research 37 (Database issue): D380–D386. doi:10.1093/nar/gkn762. PMC 2686452. PMID 19036790.  edit
  2. ^ Wilson, D.; Madera, M.; Vogel, C.; Chothia, C.; Gough, J. (2007). "The SUPERFAMILY database in 2007: Families and functions". Nucleic Acids Research 35 (Database issue): D308–D313. doi:10.1093/nar/gkl910. PMC 1669749. PMID 17098927.  edit
  3. ^ Gough, J. (2002). "The SUPERFAMILY database in structural genomics". Acta crystallographica. Section D, Biological crystallography 58 (Pt 11): 1897–1900. doi:10.1107/s0907444902015160. PMID 12393919.  edit
  4. ^ Gough, J.; Chothia, C. (2002). "SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments". Nucleic Acids Research 30 (1): 268–272. doi:10.1093/nar/30.1.268. PMC 99153. PMID 11752312.  edit
  5. ^ De Lima Morais, D. A.; Fang, H.; Rackham, O. J. L.; Wilson, D.; Pethica, R.; Chothia, C.; Gough, J. (2010). "SUPERFAMILY 1.75 including a domain-centric gene ontology method". Nucleic Acids Research 39 (Database issue): D427–D434. doi:10.1093/nar/gkq1130. PMC 3013712. PMID 21062816.  edit
  6. ^ Gough, J.; Karplus, K.; Hughey, R.; Chothia, C. (2001). "Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure1". Journal of Molecular Biology 313 (4): 903–919. doi:10.1006/jmbi.2001.5080. PMID 11697912.  edit

External links[edit]

  • dcGO A comprehensive ontology database for protein domains.
  • Spiricoil A sister resource for prediction of coiled coil structure and evolutionary analysis.