Punish/Reward: Learning with a Critic in Adaptive Threshold Systems

1 September 1973

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Systems, Man, and Cybernetics

Vol. SMC-3 (5), 455-465
https://doi.org/10.1109/tsmc.1973.4309272

Abstract

An adaptive threshold element is able to "learn" a strategy of play for the game blackjack (twenty-one) with a performance close to that of the Thorp optimal strategy although the adaptive system has no prior knowledge of the game and of the objective of play. After each winning game the decisions of the adaptive system are "rewarded." After each losing game the decisions are "punished." Reward is accomplished by adapting while accepting the actual decision as the desired response. Punishment is accomplished by adapting while taking the desired response to be the opposite of that of the actual decision. This learning scheme is unlike "learning with a teacher" and unlike "unsupervised learning." It involves "bootstrap adaptation" or "learning with a critic." The critic rewards decisions which are members of successful chains of decisions and punishes other decisions. A general analytical model for learning with a critic is formulated and analyzed. The model represents bootstrap learning per se. Although the hypotheses on which the model is based do not perfectly fit blackjack learning, it is applied heuristically to predict adaptation rates with good experimental success. New applications are being explored for bootstrap learning in adaptive controls and multilayered adaptive systems.

Keywords

This publication has 26 references indexed in Scilit:

On the advantages of the LMS spectrum analyzer over nonadaptive implementations of the sliding-DFT
IEEE Transactions on Circuits and Systems I: Regular Papers, 1995
Analysis of an Adaptive Threshold Logic Unit
IEEE Transactions on Computers, 1970
The Sum-Line Extrapolative Algorithm and Its Application to Statistical Classification Problems
IEEE Transactions on Systems Science and Cybernetics, 1970
A learning method for system identification
IEEE Transactions on Automatic Control, 1967
Adaptive antenna systems
Proceedings of the IEEE, 1967
A trainable nonlinear function generator
IEEE Transactions on Automatic Control, 1966
The use of an adaptive threshold element to design a linear optimal pattern classifier
IEEE Transactions on Information Theory, 1966
Design of quasi-optimal minimum-time controllers
IEEE Transactions on Automatic Control, 1966
A Critical Comparison of Two Kinds of Adaptive Classification Networks
IEEE Transactions on Electronic Computers, 1965
Effects of Adaptation Parameters on Convergence Time and Tolerance for Adaptive Threshold Elements
IEEE Transactions on Electronic Computers, 1964

Cited by 197 articles