Mixture Modeling for Genome‐Wide Localization of Transcription Factors

1 March 2007

journal article
research article
Published by Oxford University Press (OUP) in Biometrics

Vol. 63 (1), 10-21
https://doi.org/10.1111/j.1541-0420.2005.00659.x

Abstract

Summary Chromatin immunoprecipitation followed by DNA microarray analysis (ChIP-chip methodology) is an efficient way of mapping genome-wide protein–DNA interactions. Data from tiling arrays encompass DNA–protein interaction measurements on thousands or millions of short oligonucleotides (probes) tiling a whole chromosome or genome. We propose a new model-based method for analyzing ChIP-chip data. The proposed model is motivated by the widely used two-component multinomial mixture model of de novo motif finding. It utilizes a hierarchical gamma mixture model of binding intensities while incorporating inherent spatial structure of the data. In this model, genomic regions belong to either one of the following two general groups: regions with a local protein–DNA interaction (peak) and regions lacking this interaction. Individual probes within a genomic region are allowed to have different localization rates accommodating different binding affinities. A novel feature of this model is the incorporation of a distribution for the peak size derived from the experimental design and parameters. This leads to the relaxation of the fixed peak size assumption that is commonly employed when computing a test statistic for these types of spatial data. Simulation studies and a real data application demonstrate good operating characteristics of the method including high sensitivity with small sample sizes when compared to available alternative methods.

Keywords

This publication has 17 references indexed in Scilit:

Multiple Testing Methods For ChIP–Chip High Density Oligonucleotide Array Data
Journal of Computational Biology, 2006
TileMap: create chromosomal map of tiling array hybridizations
Bioinformatics, 2005
A high-resolution map of active promoters in the human genome
Nature, 2005
A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences
Bioinformatics, 2005
Detecting differential gene expression with a semiparametric hierarchical mixture method
Biostatistics, 2004
Unbiased Mapping of Transcription Factor Binding Sites along Human Chromosomes 21 and 22 Points to Widespread Regulation of Noncoding RNAs
Cell, 2004
On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles
Statistics in Medicine, 2003
Supervised Detection of Regulatory Motifs in DNA Sequences
Statistical Applications in Genetics and Molecular Biology, 2003
Use of Chromatin Immunoprecipitation To Clone Novel E2F Target Promoters
Molecular and Cellular Biology, 2001
Genome-Wide Location and Function of DNA Binding Proteins
Science, 2000

Cited by 36 articles