A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred (see homology). Usually this common ancestry is based on structural alignment and mechanistic similarity even though no sequence similarity is evident. Superfamilies typically contain several protein families which show sequence similarity within the family. The term protein clan is commonly used for protease superfamilies based on the MEROPS protease classification system.
Superfamily members typically show no detectable sequence homology. Indeed they are often impossible to align due to frequent insertions and deletions. In the PA clan of proteases, for example, not a single residue is conserved through the superfamily, not even those in the catalytic triad. Conversely, the individual families that make up a superfamily are defined on the basis of their sequence alignment, for example the C04 protease family within the PA clan.
Structure is much more evolutionarily conserved than sequence (as also exemplified by the PA clan of proteases). Very few residues show much amino acid sequence conservation, however secondary structural elements are highly conserved as are their arrangement in tertiary structural motifs. Structural alignment programs such as DALI can use the 3D structure of a protein of interest as to find proteins with similar folds. Comparing 3D structures can identify evolutionary relatedness that sequence comparison cannot.
The catalytic mechanism of enzymes within a superfamily is typically conserved, although substrate specificity may be significantly different. Catalytic residues also tend to occur in the same order in the protein sequence. Once again, the PA clan of proteases acts as an example. Even though families within the superfamily use different nucleophiles, they all perform covalent, nucleophilic catalysis on proteins, peptides or amino acids through a similar mechanism.
Protein superfamilies represent the current limits of our ability to identify common ancestry. They are the largest evolutionary grouping based on direct evidence that is currently possible. They are therefore amongst the most ancient evolutionary events currently studied. Some superfamilies have members present in all kingdoms of life, indicating that the last common ancestor of that superfamily was in the last universal common ancestor of all life (LUCA).
Superfamily members may be in different species, with the ancestral protein being the form of the protein that existed in the ancestral species (orthology). Conversely, the proteins may be in the same species, but evolved from a single protein whose gene was duplicated in the genome (paralogy).
PA clan of chymotrypsin-like proteases - Members share a double β-barrel fold and similar proteolysis mechanisms but sequence identity of <10%. The clan contains both cysteine and serine proteases (different nucleophiles).
α/β hydrolase superfamily - Members share an α/β sheet, containing 8 strands connected by helices with catalytic triad residues in the same order, activities include proteases, lipases, peroxidases, esterases, epoxide hydrolases and dehalogenases.
Alkaline phosphatase superfamily -
Ras superfamily - Members share a common the catalytic G domain.
Protein superfamily resources
- Pfam - Protein families database of alignments and HMMs
- PROSITE - Database of protein domains, families and functional sites
- PIRSF - SuperFamily Classification System
- PASS2 - Protein Alignment as Structural Superfamilies v2
- SUPERFAMILY - Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
- SCOP and CATH - Classifications of protein structures into superfamilies, families and domains
Similarly there are algorithms that search the PDB for proteins with structural homology to a target structure, for example:
- DALI - Structural alignment based on a distance alignment matrix method
- Structural alignment
- Protein family
- Protein structure
- Protein domains
- Homology (biology)
- List of gene families
- Holm, L; Rosenström, P (July 2010). "Dali server: conservation mapping in 3D.". Nucleic Acids Research 38 (Web Server issue): W545–9. doi:10.1093/nar/gkq366. PMID 20457744.
- Rawlings, ND; Barrett, AJ; Bateman, A (January 2012). "MEROPS: the database of proteolytic enzymes, their substrates and inhibitors.". Nucleic Acids Research 40 (Database issue): D343–50. doi:10.1093/nar/gkr987. PMC 3245014. PMID 22086950.
- Shakhnovich, BE; Deeds, E; Delisi, C; Shakhnovich, E (March 2005). "Protein structure and evolutionary history determine sequence space topology.". Genome Research 15 (3): 385–92. doi:10.1101/gr.3133605. PMID 15741509.
- Ranea, JA; Sillero, A; Thornton, JM; Orengo, CA (October 2006). "Protein superfamily evolution and the last universal common ancestor (LUCA).". Journal of molecular evolution 63 (4): 513–25. doi:10.1007/s00239-005-0289-7. PMID 17021929.
- Bazan, JF; Fletterick, RJ (November 1988). "Viral cysteine proteases are homologous to the trypsin-like family of serine proteases: structural and functional implications.". Proceedings of the National Academy of Sciences of the United States of America 85 (21): 7872–6. doi:10.1073/pnas.85.21.7872. PMC 282299. PMID 3186696.
- Carr PD, Ollis DL (2009). "Alpha/beta hydrolase fold: an update". Protein Pept. Lett. 16 (10): 1137–48. PMID 19508187.
- Nardini M, Dijkstra BW (December 1999). "Alpha/beta hydrolase fold enzymes: the family keeps growing". Curr. Opin. Struct. Biol. 9 (6): 732–7. doi:10.1016/S0959-440X(99)00037-8. PMID 10607665.
- Nagano, N; Orengo, CA; Thornton, JM (Aug 30, 2002). "One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions.". Journal of Molecular Biology 321 (5): 741–65. PMID 12206759.
- Farber, G (1993). "An α/β-barrel full of evolutionary trouble". Current Opinion in Structural Biology 3 (3): 409–412. doi:10.1016/S0959-440X(05)80114-9.