CATH database

From Wikipedia, the free encyclopedia
  (Redirected from CATH)
Jump to: navigation, search
CATH database logo.png
Description Protein Structure Classification
Research center University College London
Laboratory Institute of Structural and Molecular Biology
Primary citation Cuff & al. (2011)[1]
Release date 1997

The CATH Protein Structure Classification is a semi-automatic, hierarchical classification of protein domains published in 1997 by Christine Orengo, Janet Thornton and their colleagues.[2] CATH shares many broad features with its principal rival, SCOP, however there are also many areas in which the detailed classification differs greatly.


The name CATH is an acronym of the four main levels in the classification.

The four main levels of the CATH hierarchy are as follows:
# Level Description
1 Class the overall secondary-structure content of the domain. (Equivalent to SCOP class)
2 Architecture high structural similarity but no evidence of homology. (Equivalent to SCOP fold)
3 Topology a large-scale grouping of topologies which share particular structural features
4 Homologous superfamily indicative of a demonstrable evolutionary relationship. (Equivalent to SCOP superfamily)

CATH defines four classes: mostly-alpha, mostly-beta, alpha and beta, few secondary structures.

In order to better understand the CATH classification system it is useful to know how it is constructed: much of the work is done by automatic methods, however there are important manual elements to the classification.

The very first step is to separate the proteins into domains. It is difficult to produce an unequivocal definition of a domain and this is one area in which CATH and SCOP differ.

The domains are automatically sorted into classes and clustered on the basis of sequence similarities. These groups form the H levels of the classification. The topology level is formed by structural comparisons of the homologous groups. Finally, the Architecture level is assigned manually.

Class Level classification is done on the basis of 4 criteria:

  1. Secondary structure content;
  2. Secondary structure contacts;
  3. Secondary structure alternation score; and
  4. Percentage of parallel strands.

More detail on this process and the comparison between SCOP, CATH and FSSP can be found in: Hadley & Jones, 1999[3] and Day et al., 2003.[4]


See also[edit]


  1. ^ Cuff, Alison L; Sillitoe Ian; Lewis Tony; Clegg Andrew B; Rentzsch Robert; Furnham Nicholas; Pellegrini-Calace Marialuisa; Jones David; Thornton Janet; Orengo Christine A (Jan 2011). "Extending CATH: increasing coverage of the protein structure universe and linking structure with function". Nucleic Acids Res. (England) 39 (Database issue): D420–6. doi:10.1093/nar/gkq1001. PMC 3013636. PMID 21097779. 
  2. ^ Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (1997). "CATH--a hierarchic classification of protein domain structures". Structure 5 (8): 1093–1108. doi:10.1016/S0969-2126(97)00260-8. PMID 9309224. 
  3. ^ Hadley C, Jones DT (1999). "A systematic comparison of protein structure classifications: SCOP, CATH and FSSP". Structure 7 (9): 1099–1112. doi:10.1016/S0969-2126(99)80177-4. PMID 10508779. 
  4. ^ Day R, Beck DA, Armen RS, Daggett V (2003). "A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary". Protein Sci. 12 (10): 2150–2160. doi:10.1110/ps.0306803. PMC 2366924. PMID 14500873. 

External links[edit]