ChIPmix: mixture model of regressions for two-color ChIP–chip analysis

Abstract
Motivation: Chromatin immunoprecipitation (ChIP) combined with DNA microarray is a high-throughput technology to investigate DNA–protein binding or chromatin/histone modifications. ChIP–chip data require adapted statistical method in order to identify enriched regions. All methods already proposed are based on the analysis of the log ratio (Ip/Input). Nevertheless, the assumption that the log ratio is a pertinent quantity to assess the probe status is not always verified and it leads to a poor data interpretation. Results: Instead of working on the log ratio, we directly work with the Ip and Input signals of each probe by modeling the distribution of the Ip signal conditional to the Input signal. We propose a method named ChIPmix based on a linear regression mixture model to identify actual binding targets of the protein under study. Moreover, we are able to control the proportion of false positives. The efficiency of ChIPmix is illustrated on several datasets obtained from different organisms and hybridized either on tiling or promoter arrays. This validation shows that ChIPmix is convenient for any two-color array whatever its density and provides promising results. Availability: The ChIPmix method is implemented in R and is available at http://www.agroparistech.fr/mia/outil_A.html Contact: marie_laure.martin@agroparistech.fr