CATH: an expanded resource to predict protein function through structure and sequence

Top Cited Papers

Open Access

28 November 2016

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 45 (D1), D289-D295
https://doi.org/10.1093/nar/gkw1098

Abstract

The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site.

Keywords

This publication has 24 references indexed in Scilit:

An expanded evaluation of protein function prediction methods shows an improvement in accuracy
Genome Biology, 2016
The Ensembl gene annotation system
Database: The Journal of Biological Databases and Curation, 2016
Large-Scale Analysis Exploring Evolution of Catalytic Machineries and Mechanisms in Enzyme Superfamilies
Journal of Molecular Biology, 2015
Functional classification of CATH superfamilies: a domain-based approach for protein function annotation
Bioinformatics, 2015
HMMER web server: 2015 update
Nucleic Acids Research, 2015
3Dmol.js: molecular visualization with WebGL
Bioinformatics, 2014
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
Molecular Biology and Evolution, 2013
CD-HIT: accelerated for clustering the next-generation sequencing data
Bioinformatics, 2012
Improving classification in protein structure databases using text mining
BMC Bioinformatics, 2009
[36] SSAP: Sequential structure alignment program for protein structure comparison
Methods in Enzymology, 1996

Cited by 318 articles