RefSeq: an update on prokaryotic genome annotation and curation
Top Cited Papers
Open Access
- 4 January 2018
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 46 (D1), D851-D860
- https://doi.org/10.1093/nar/gkx1068
Abstract
The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contamination. Genomes are annotated by a single Prokaryotic Genome Annotation Pipeline (PGAP) to provide users with a resource that is as consistent and accurate as possible. Notable recent changes include the development of a hierarchical evidence scheme, a new focus on curating annotation evidence sources, the addition and curation of protein profile hidden Markov models (HMMs), release of an updated pipeline (PGAP-4), and comprehensive re-annotation of RefSeq prokaryotic genomes. Antimicrobial resistance proteins have been reannotated comprehensively, improved structural annotation of insertion sequence transposases and selenoproteins is provided, curated complex domain architectures have given upgraded names to millions of multidomain proteins, and we introduce a new kind of annotation rule-BlastRules. Continual curation of supporting evidence, and propagation of improved names onto RefSeq proteins ensures that the functional annotation of genomes is kept current. An increasing share of our annotation now derives from HMMs and other sets of annotation rules that are portable by nature, and available for download and for reuse by other investigators. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.Keywords
This publication has 27 references indexed in Scilit:
- Applying the ResFinder and VirulenceFinder web-services for easy identification of acquired antibiotic resistance andE. colivirulence genes in bacteriophage and prophage nucleotide sequencesBacteriophage, 2014
- RefSeq microbial genomes database: new representation and annotation strategyNucleic Acids Research, 2013
- TIGRFAMs and Genome Properties in 2013Nucleic Acids Research, 2012
- Database resources of the National Center for Biotechnology InformationNucleic Acids Research, 2010
- The National Center for Biotechnology Information's Protein Clusters DatabaseNucleic Acids Research, 2008
- β-Lactamase NomenclatureAntimicrobial Agents and Chemotherapy, 2006
- TIGRFAMs: a protein family resource for the functional identification of proteinsNucleic Acids Research, 2001
- NCBI's LocusLink and RefSeqNucleic Acids Research, 2000
- Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequenceNature, 1998
- The structure of β-lactamasesPhilosophical Transactions of the Royal Society of London. B, Biological Sciences, 1980