Refinements of MDL and MML Coding

Abstract
We discuss Rissanen's scheme of ‘complete coding’ in which a two-part data code is further shortened by conditioning the second part not only on the estimates, but also on the fact that these estimates were preferred to any others. We show that the scheme does not lead to improved estimates of parameters. The resulting message lengths may validly be employed to select among competing model classes in a global hypothesis space, but not to select a single member of the chosen class. A related coding scheme is introduced in which the message commences by encoding an ancillary statistic, and then states parameter estimates using a code conditioned on this statistic. The use of Jeffreys priors in MDL codes is questioned and the resulting normalization difficulties and violations of the likelihood principle are discussed. We argue that the MDL objective of avoiding Bayesian priors may be better pursued by other means.