A boosting approach for motif modeling using ChIP-chip data

7 April 2005

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 21 (11), 2636-2643
https://doi.org/10.1093/bioinformatics/bti402

Abstract

Motivation: Building an accurate binding model for a transcription factor (TF) is essential to differentiate its true binding targets from those spurious ones. This is an important step toward understanding gene regulation. Results: This paper describes a boosting approach to modeling TF–DNA binding. Different from the widely used weight matrix model, which predicts TF–DNA binding based on a linear combination of position-specific contributions, our approach builds a TF binding classifier by combining a set of weight matrix based classifiers, thus yielding a non-linear binding decision rule. The proposed approach was applied to the ChIP-chip data of Saccharomyces cerevisiae. When compared with the weight matrix method, our new approach showed significant improvements on the specificity in a majority of cases. Contact:wwong@hsph.harvard.edu Supplementary information: The software and the Supplementary data are available at http://biogibbs.stanford.edu/~hong2004/MotifBooster/.

Keywords

This publication has 16 references indexed in Scilit:

Predicting Gene Expression from Sequence
Cell, 2004
Integrating regulatory motif discovery and genome-wide expression analysis
Proceedings of the National Academy of Sciences of the United States of America, 2003
Transcriptional Regulatory Networks in Saccharomyces cerevisiae
Science, 2002
An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experiments
Nature Biotechnology, 2002
Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors
Nucleic Acids Research, 2002
Stochastic gradient boosting
Computational Statistics & Data Analysis, 2002
Exploring the DNA-binding specificities of zinc fingers with DNA microarrays
Proceedings of the National Academy of Sciences of the United States of America, 2001
Regulatory element detection using correlation with expression
Nature Genetics, 2001
Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment
Science, 1993
An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences
Proteins, 1990

Cited by 41 articles