The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis

Open Access

17 December 2004

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 33 (Database ), D247-D251
https://doi.org/10.1093/nar/gki024

Abstract

The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 43229 domains classified into 1467 superfamilies and 5107 sequence families. Each structural family is expanded with sequence relatives from GenBank and completed genomes, using a variety of efficient sequence search protocols and reliable thresholds. This extended CATH protein family database contains 616470 domain sequences classified into 23876 sequence families. This results in the significant expansion of the CATH HMM model library to include models built from the CATH sequence relatives, giving a 10% increase in coverage for detecting remote homologues. An improved Dictionary of Homologous superfamilies (DHS) (http://www.biochem.ucl.ac.uk/bsm/dhs/) containing specific sequence, structural and functional information for each superfamily in CATH considerably assists manual validation of homologues. Information on sequence relatives in CATH superfamilies, GenBank and completed genomes is presented in the CATH associated DHS and Gene3D resources. Domain partnership information can be obtained from Gene3D (http://www.biochem.uci.ac.uk/bsm/cath/Gene3D/). A new CATH server has been implemented (http://www.biochem.uci.ac.uk/cgi-bin/cath/CathServer.pl) providing automatic classification of newly determined sequences and structures using a suite of rapid sequence and structure comparison methods. The statistical significance of matches is assessed and links are provided to the putative superfamily or fold group to which the query sequence or structure is assigned.

Keywords

This publication has 19 references indexed in Scilit:

Evolution of Protein Superfamilies and Bacterial Genome Size
Journal of Molecular Biology, 2004
The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues
Protein Engineering, Design and Selection, 2000
The ENZYME database in 2000
Nucleic Acids Research, 2000
KEGG: Kyoto Encyclopedia of Genes and Genomes
Nucleic Acids Research, 2000
Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods
Journal of Molecular Biology, 1998
Hidden Markov models for detecting remote protein homologies.
Bioinformatics, 1998
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Hidden Markov models
Current Opinion in Structural Biology, 1996
Protein superfamilles and domain superfolds
Nature, 1994
Protein structure alignment
Journal of Molecular Biology, 1989

Cited by 213 articles