PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species
Open Access
- 13 August 2012
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 40 (22), e172
- https://doi.org/10.1093/nar/gks757
Abstract
Pan-genome ortholog clustering tool (PanOCT) is a tool for pan-genomic analysis of closely related prokaryotic species or strains. PanOCT uses conserved gene neighborhood information to separate recently diverged paralogs into orthologous clusters where homology-only clustering methods cannot. The results from PanOCT and three commonly used graph-based ortholog-finding programs were compared using a set of four publicly available strains of the same bacterial species. All four methods agreed on ∼70% of the clusters and ∼86% of the proteins. The clusters that did not agree were inspected for evidence of correctness resulting in 85 high-confidence manually curated clusters that were used to compare all four methods.Keywords
This publication has 26 references indexed in Scilit:
- Comparative genomic analysis of Streptococcus suis reveals significant genomic diversity among different serotypesBMC Genomics, 2011
- Computational methods for Gene Orthology inferenceBriefings in Bioinformatics, 2011
- Bacterial syntenies: an exact approach with gene quorumBMC Bioinformatics, 2011
- Meeting report: a workshop on Best Practices in Genome AnnotationDatabase: The Journal of Biological Databases and Curation, 2010
- The comprehensive microbial resourceNucleic Acids Research, 2009
- ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotesNucleic Acids Research, 2008
- The Yeast Gene Order Browser: Combining curated homology and syntenic context reveals gene fate in polyploid speciesGenome Research, 2005
- OrthoMCL: Identification of Ortholog Groups for Eukaryotic GenomesGenome Research, 2003
- An efficient algorithm for large-scale detection of protein familiesNucleic Acids Research, 2002
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997