Domain-centric database to uncover structure of minimally characterized viral genomes
Open Access
- 25 June 2020
- journal article
- research article
- Published by Springer Science and Business Media LLC in Scientific Data
- Vol. 7 (1), 1-11
- https://doi.org/10.1038/s41597-020-0536-1
Abstract
Protein domain-based approaches to analyzing sequence data are valuable tools for examining and exploring genomic architecture across genomes of different organisms. Here, we present a complete dataset of domains from the publicly available sequence data of 9,051 reference viral genomes. The data provided contain information such as sequence position and neighboring domains from 30,947 pHMM-identified domains from each reference viral genome. Domains were identified from viral whole-genome sequence using automated profile Hidden Markov Models (pHMM). This study also describes the framework for constructing "domain neighborhoods", as well as the dataset representing it. These data can be used to examine shared and differing domain architectures across viral genomes, to elucidate potential functional properties of genes, and potentially to classify viruses.Keywords
This publication has 21 references indexed in Scilit:
- SecretEPDB: a comprehensive web-based resource for secreted effector proteins of the bacterial types III, IV and VI secretion systemsScientific Reports, 2017
- Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotationNucleic Acids Research, 2016
- A phylogenomic data-driven exploration of viral origins and evolutionScience Advances, 2015
- Profile Hidden Markov Models for the Detection of Viruses within Metagenomic Sequence DataPLOS ONE, 2014
- Bacterial group I introns: mobile RNA catalystsMobile DNA, 2014
- Accelerated Profile HMM SearchesPLoS Computational Biology, 2011
- Role of Nonstructural Protein NS2A in Flavivirus AssemblyJournal of Virology, 2008
- Programmed Ribosomal Frameshifting Goes Beyond VirusesMicrobe Magazine, 2006
- Phylogenetic structure of the prokaryotic domain: The primary kingdomsProceedings of the National Academy of Sciences, 1977
- Objective Criteria for the Evaluation of Clustering MethodsJournal of the American Statistical Association, 1971