Nonparametric entropy estimation for stationary processes and random fields, with applications to English text

Abstract
We discuss a family of estimators for the entropy rate of a stationary ergodic process and prove their pointwise and mean consistency under a Doeblin-type mixing condition. The estimators are Cesaro averages of longest match-lengths, and their consistency follows from a generalized ergodic theorem due to Maker (1940). We provide examples of their performance on English text, and we generalize our results to countable alphabet processes and to random fields.

This publication has 24 references indexed in Scilit: