Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction

1 July 1999

journal article
Published by MIT Press in Neural Computation

Vol. 11 (5), 1155-1182
https://doi.org/10.1162/089976699300016395

Abstract

We introduce an entropic prior for multinomial parameter estimation problems and solve for its maximum a posteriori (MAP) estimator. The prior is a bias for maximally structured and minimally ambiguous models. In conditional probability models with hidden state, iterative MAP estimation drives weakly supported parameters toward extinction, effectively turning them off. Thus, structure discovery is folded into parameter estimation. We then establish criteria for simplifying a probabilistic model's graphical structure by trimming parameters and states, with a guarantee that any such deletion will increase the posterior probability of the model. Trimming accelerates learning by sparsifying the model. All operations monotonically and maximally increase the posterior probability, yielding structure-learning algorithms only slightly slower than parameter estimation via expectation-maximization and orders of magnitude faster than search-based structure induction. When applied to hidden Markov model training, the resulting models show superior generalization to held-out test data. In many cases the resulting models are so sparse and concise that they are interpretable, with hidden states that strongly correlate with meaningful categories.

Keywords

This publication has 12 references indexed in Scilit:

Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms
Neural Computation, 1998
Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions
Neural Computation, 1997
HMM topology design using maximum likelihood successive state splitting
Computer Speech & Language, 1997
On the LambertW function
Advances in Computational Mathematics, 1996
The birth of the giant component
Random Structures & Algorithms, 1993
Gaussian limiting distributions for the number of components in combinatorial structures
Journal of Combinatorial Theory, Series A, 1990
A tutorial on hidden Markov models and selected applications in speech recognition
Proceedings of the IEEE, 1989
Dynamic programming algorithm optimization for spoken word recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1978
On some problems of random nets
Bulletin of Mathematical Biology, 1952
Connectivity of random nets
Bulletin of Mathematical Biology, 1951

Cited by 79 articles