Improving genome annotations using phylogenetic profile anomaly detection
Open Access
- 16 September 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (4), 464-470
- https://doi.org/10.1093/bioinformatics/bti027
Abstract
Motivation: A promising strategy for refining genome annotations is to detect features that conflict with known functional or evolutionary relationships between groups of genes. Previous work in this area has been focused on investigating the absence of ‘housekeeping’ genes or components of well-studied pathways. We have sought to develop a method for improving new annotations that can automatically synthesize and use the information available in a database of other annotated genomes. Results: We show that a probabilistic model of phylogenetic profiles, trained from a database of curated genome annotations, can be used to reliably detect errors in new annotations. We use our method to identify 22 genes that were missed in previously published annotations of prokaryotic genomes. Availability: The method was evaluated using MATLAB and open source software referenced in this work. Scripts and datasets are available from the authors upon request. Contact:tarjei@broad.mit.eduThis publication has 24 references indexed in Scilit:
- Patterns of Bacterial Gene MovementMolecular Biology and Evolution, 2004
- A Gene-Coexpression Network for Global Discovery of Conserved Genetic ModulesScience, 2003
- The Phylogenetic Extent of Metabolic Enzymes and PathwaysGenome Research, 2003
- Genome Sequence of the Plant Pathogen and Biotechnology Agent Agrobacterium tumefaciens C58Science, 2001
- Intrinsic errors in genome annotationTrends in Genetics, 2001
- Massive gene decay in the leprosy bacillusNature, 2001
- Using Bayesian Networks to Analyze Expression DataJournal of Computational Biology, 2000
- Errors in genome annotationTrends in Genetics, 1999
- A Genomic Perspective on Protein FamiliesScience, 1997
- Fitting the Gene Lineage into its Species Lineage, a Parsimony Strategy Illustrated by Cladograms Constructed from Globin SequencesSystematic Zoology, 1979