Always Good Turing: Asymptotically Optimal Probability Estimation

17 October 2003

journal article
other
Published by American Association for the Advancement of Science (AAAS) in Science

Vol. 302 (5644), 427-431
https://doi.org/10.1126/science.1088284

Abstract

While deciphering the Enigma code, Good and Turing derived an unintuitive, yet effective, formula for estimating a probability distribution from a sample of data. We define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet greater than 1. We then derive an estimator whose attenuation is 1; that is, asymptotically it does not underestimate the probability of any sequence.

Keywords

This publication has 12 references indexed in Scilit:

Turing’s anticipation of empirical bayes in connection with the cryptanalysis of the naval enigma^*
Journal of Statistical Computation and Simulation, 2000
Universal prediction
IEEE Transactions on Information Theory, 1998
Fisher information and stochastic complexity
IEEE Transactions on Information Theory, 1996
Redundancy rates for renewal and other processes
IEEE Transactions on Information Theory, 1996
Probability scoring for spelling correction
Statistics and Computing, 1991
The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression
IEEE Transactions on Information Theory, 1991
On Turing's formula for word probabilities
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1985
The performance of universal encoding
IEEE Transactions on Information Theory, 1981
Universal noiseless coding
IEEE Transactions on Information Theory, 1973
THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS
Biometrika, 1953

Cited by 77 articles