What Size Net Gives Valid Generalization?

1 March 1989

journal article
Published by MIT Press in Neural Computation

Vol. 1 (1), 151-160
https://doi.org/10.1162/neco.1989.1.1.151

Abstract

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size. Assume 0 < ∊ ≤ 1/8. We show that if m ≥ O(W/∊ log N/∊) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 − ∊/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 1 − ∊ of future test examples drawn from the same distribution. Conversely, for fully-connected feedforward nets with one hidden layer, any learning algorithm using fewer than Ω(W/∊) random training examples will, for some distributions of examples consistent with an appropriate weight choice, fail at least some fixed fraction of the time to find a weight choice that will correctly classify more than a 1 − ∊ fraction of the future test examples.

Keywords

PROBABILITY DISTRIBUTION

This publication has 4 references indexed in Scilit:

Quantifying inductive bias: AI learning algorithms and Valiant's learning framework
Artificial Intelligence, 1988
Predicting the secondary structure of globular proteins using neural network models
Journal of Molecular Biology, 1988
Occam's Razor
Information Processing Letters, 1987
Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition
IEEE Transactions on Electronic Computers, 1965

Cited by 1025 articles