A boosting approach for motif modeling using ChIP-chip data
- 7 April 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (11), 2636-2643
- https://doi.org/10.1093/bioinformatics/bti402
Abstract
Motivation: Building an accurate binding model for a transcription factor (TF) is essential to differentiate its true binding targets from those spurious ones. This is an important step toward understanding gene regulation. Results: This paper describes a boosting approach to modeling TF–DNA binding. Different from the widely used weight matrix model, which predicts TF–DNA binding based on a linear combination of position-specific contributions, our approach builds a TF binding classifier by combining a set of weight matrix based classifiers, thus yielding a non-linear binding decision rule. The proposed approach was applied to the ChIP-chip data of Saccharomyces cerevisiae. When compared with the weight matrix method, our new approach showed significant improvements on the specificity in a majority of cases. Contact:wwong@hsph.harvard.edu Supplementary information: The software and the Supplementary data are available at http://biogibbs.stanford.edu/~hong2004/MotifBooster/.Keywords
This publication has 16 references indexed in Scilit:
- Predicting Gene Expression from SequenceCell, 2004
- Integrating regulatory motif discovery and genome-wide expression analysisProceedings of the National Academy of Sciences of the United States of America, 2003
- Transcriptional Regulatory Networks in Saccharomyces cerevisiaeScience, 2002
- An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experimentsNature Biotechnology, 2002
- Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factorsNucleic Acids Research, 2002
- Stochastic gradient boostingComputational Statistics & Data Analysis, 2002
- Exploring the DNA-binding specificities of zinc fingers with DNA microarraysProceedings of the National Academy of Sciences of the United States of America, 2001
- Regulatory element detection using correlation with expressionNature Genetics, 2001
- Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple AlignmentScience, 1993
- An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequencesProteins, 1990