Gene Context Analysis in the Integrated Microbial Genomes (IMG) Data Management System
Open Access
- 24 November 2009
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 4 (11), e7979
- https://doi.org/10.1371/journal.pone.0007979
Abstract
Computational methods for determining the function of genes in newly sequenced genomes have been traditionally based on sequence similarity to genes whose function has been identified experimentally. Function prediction methods can be extended using gene context analysis approaches such as examining the conservation of chromosomal gene clusters, gene fusion events and co-occurrence profiles across genomes. Context analysis is based on the observation that functionally related genes are often having similar gene context and relies on the identification of such events across phylogenetically diverse collection of genomes. We have used the data management system of the Integrated Microbial Genomes (IMG) as the framework to implement and explore the power of gene context analysis methods because it provides one of the largest available genome integrations. Visualization and search tools to facilitate gene context analysis have been developed and applied across all publicly available archaeal and bacterial genomes in IMG. These computations are now maintained as part of IMG's regular genome content update cycle. IMG is available at: http://img.jgi.doe.gov.Keywords
This publication has 15 references indexed in Scilit:
- The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadataNucleic Acids Research, 2007
- SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARBNucleic Acids Research, 2007
- The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensionsNucleic Acids Research, 2007
- NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2007
- Burkholderia xenovorans LB400 harbors a multi-replicon, 9.73-Mbp genome shaped for versatilityProceedings of the National Academy of Sciences of the United States of America, 2006
- STRING: known and predicted protein-protein associations, integrated and transferred across organismsNucleic Acids Research, 2004
- FusionDB: a database for in-depth analysis of prokaryotic gene fusion eventsNucleic Acids Research, 2004
- OrthoMCL: Identification of Ortholog Groups for Eukaryotic GenomesGenome Research, 2003
- An efficient algorithm for large-scale detection of protein familiesNucleic Acids Research, 2002
- Archaeal Shikimate Kinase, a New Member of the GHMP-Kinase FamilyJournal of Bacteriology, 2001