DFAST and DAGA: web-based integrated genome annotation tools and resources
Open Access
- 1 January 2016
- journal article
- Published by BMFH Press in Bioscience of Microbiota, Food and Health
- Vol. 35 (4), 173-184
- https://doi.org/10.12938/bmfh.16-003
Abstract
Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus, obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genome resources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii, whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at https://dfast.nig.ac.jp.Keywords
This publication has 47 references indexed in Scilit:
- The sequence read archive: explosive growth of sequencing dataNucleic Acids Research, 2011
- Phylogenomic reconstruction of lactic acid bacteria: an updateBMC Evolutionary Biology, 2011
- Shifting the genomic gold standard for the prokaryotic species definitionProceedings of the National Academy of Sciences of the United States of America, 2009
- Intraspecies Genomic Diversity and Natural Population Structure of the Meat-Borne Lactic Acid Bacterium Lactobacillus sakeiApplied and Environmental Microbiology, 2009
- The type strain of Lactobacillus casei is ATCC 393, ATCC 334 cannot serve as the type because it represents a different taxon, the name Lactobacillus paracasei and its subspecies names are not rejected and the revival of the name 'Lactobacillus zeae' contravenes Rules 51b (1) and (2) of the International Code of Nomenclature of Bacteria. Opinion 82International Journal of Systematic and Evolutionary Microbiology, 2008
- Taxonomic Reliability of DNA Sequences in Public Sequence Databases: A Fungal PerspectivePLOS ONE, 2006
- MBGD: a platform for microbial comparative genomics based on the automated construction of orthologous groupsNucleic Acids Research, 2006
- Comparative genomics of the lactic acid bacteriaProceedings of the National Academy of Sciences of the United States of America, 2006
- OrthoMCL: Identification of Ortholog Groups for Eukaryotic GenomesGenome Research, 2003
- Approved Lists of Bacterial NamesInternational Journal of Systematic and Evolutionary Microbiology, 1980