Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
Top Cited Papers
Open Access
- 8 November 2015
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 44 (D1), D733-D745
- https://doi.org/10.1093/nar/gkv1189
Abstract
The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.Keywords
This publication has 66 references indexed in Scilit:
- A Subset of Histone H2B Genes Produces Polyadenylated mRNAs under a Variety of Cellular ConditionsPLOS ONE, 2013
- The oyster genome reveals stress adaptation and complexity of shell formationNature, 2012
- Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for FungiProceedings of the National Academy of Sciences of the United States of America, 2012
- PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regionsBioinformatics, 2011
- Uniformity of rotavirus strain nomenclature proposed by the Rotavirus Classification Working Group (RCWG)Archiv für die gesamte Virusforschung, 2011
- The NIH Roadmap Epigenomics Mapping ConsortiumNature Biotechnology, 2010
- Insights into social insects from the genome of the honeybee Apis melliferaNature, 2006
- De novo identification of repeat families in large genomesBioinformatics, 2005
- The Human and Mouse Replication-Dependent Histone GenesGenomics, 2002
- Initial sequencing and analysis of the human genomeNature, 2001