CATH: an expanded resource to predict protein function through structure and sequence
Top Cited Papers
Open Access
- 28 November 2016
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 45 (D1), D289-D295
- https://doi.org/10.1093/nar/gkw1098
Abstract
The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site.Keywords
This publication has 24 references indexed in Scilit:
- An expanded evaluation of protein function prediction methods shows an improvement in accuracyGenome Biology, 2016
- The Ensembl gene annotation systemDatabase: The Journal of Biological Databases and Curation, 2016
- Large-Scale Analysis Exploring Evolution of Catalytic Machineries and Mechanisms in Enzyme SuperfamiliesJournal of Molecular Biology, 2015
- Functional classification of CATH superfamilies: a domain-based approach for protein function annotationBioinformatics, 2015
- HMMER web server: 2015 updateNucleic Acids Research, 2015
- 3Dmol.js: molecular visualization with WebGLBioinformatics, 2014
- MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and UsabilityMolecular Biology and Evolution, 2013
- CD-HIT: accelerated for clustering the next-generation sequencing dataBioinformatics, 2012
- Improving classification in protein structure databases using text miningBMC Bioinformatics, 2009
- [36] SSAP: Sequential structure alignment program for protein structure comparisonMethods in Enzymology, 1996