Genome-wide de novo prediction of cis-regulatory binding sites in prokaryotes
Open Access
- 21 April 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 37 (10), e72
- https://doi.org/10.1093/nar/gkp248
Abstract
Although cis-regulatory binding sites (CRBSs) are at least as important as the coding sequences in a genome, our general understanding of them in most sequenced genomes is very limited due to the lack of efficient and accurate experimental and computational methods for their characterization, which has largely hindered our understanding of many important biological processes. In this article, we describe a novel algorithm for genome-wide de novo prediction of CRBSs with high accuracy. We designed our algorithm to circumvent three identified difficulties for CRBS prediction using comparative genomics principles based on a new method for the selection of reference genomes, a new metric for measuring the similarity of CRBSs, and a new graph clustering procedure. When operon structures are correctly predicted, our algorithm can predict 81% of known individual binding sites belonging to 94% of known cis-regulatory motifs in the Escherichia coli K12 genome, while achieving high prediction specificity. Our algorithm has also achieved similar prediction accuracy in the Bacillus subtilis genome, suggesting that it is very robust, and thus can be applied to any other sequenced prokaryotic genome. When compared with the prior state-of-the-art algorithms, our algorithm outperforms them in both prediction sensitivity and specificity.Keywords
This publication has 68 references indexed in Scilit:
- RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigationNucleic Acids Research, 2007
- DBTBS: a database of transcriptional regulation in Bacillus subtilis containing upstream intergenic conservation informationNucleic Acids Research, 2007
- Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadataNucleic Acids Research, 2007
- A phylogenetic Gibbs sampler that yields centroid solutions forcis-regulatory site predictionBioinformatics, 2007
- Operon prediction using both genome-specific and general genomic informationNucleic Acids Research, 2006
- Bacterial regulatory networks are extremely flexible in evolutionNucleic Acids Research, 2006
- Computational identification of transcriptional regulatory elements in DNA sequenceNucleic Acids Research, 2006
- Assessing computational tools for the discovery of transcription factor binding sitesNature Biotechnology, 2005
- An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experimentsNature Biotechnology, 2002
- Network motifs in the transcriptional regulation network of Escherichia coliNature Genetics, 2002