Adaptive estimation for Hawkes processes; application to genome analysis
Open Access
- 1 October 2010
- journal article
- research article
- Published by Institute of Mathematical Statistics in The Annals of Statistics
- Vol. 38 (5), 2781-2822
- https://doi.org/10.1214/10-aos806
Abstract
The aim of this paper is to provide a new method for the detection of either favored or avoided distances between genomic events along DNA sequences. These events are modeled by a Hawkes process. The biological problem is actually complex enough to need a nonasymptotic penalized model selection approach. We provide a theoretical penalty that satisfies an oracle inequality even for quite complex families of models. The consecutive theoretical estimator is shown to be adaptive minimax for Holderian functions with regularity in (1/2, 1]: those aspects have not yet been studied for the Hawkes' process. Moreover, we introduce an efficient strategy, named Islands, which is not classically used in model selection, but that happens to be particularly relevant to the biological question we want to answer. Since a multiplicative constant in the theoretical penalty is not computable in practice, we provide extensive simulations to find a data-driven calibration of this constant. The results obtained on real genomic data are coherent with biological knowledge and eventually refine them.Keywords
Other Versions
This publication has 18 references indexed in Scilit:
- Adaptive estimation of the transition density of a Markov chainAnnales de l'Institut Henri Poincaré, Probabilités et Statistiques, 2007
- Penalized projection estimators of the Aalen multiplicative intensityBernoulli, 2006
- Minimal Penalties for Gaussian Model SelectionProbability Theory and Related Fields, 2006
- A New Lower Bound for Multiple Hypothesis TestingIEEE Transactions on Information Theory, 2005
- FADO: A Statistical Method to Detect Favored or Avoided Distances between Occurrences of Motifs using the Hawkes' ModelStatistical Applications in Genetics and Molecular Biology, 2005
- Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalitiesProbability Theory and Related Fields, 2003
- Gaussian model selectionJournal of the European Mathematical Society, 2001
- Adaptive estimation in autoregression or -mixing regression via model selectionThe Annals of Statistics, 2001
- Model selection for (auto-)regression with dependent dataESAIM: Probability and Statistics, 2001
- Probabilistic and Statistical Properties of Words: An OverviewJournal of Computational Biology, 2000