The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty

Abstract
The Hearsay-II system, developed during the DARPA-sponsored five-year speech- understanding research program, represents both a specific solution to the speech- understanding problem and a general framework for coordinating independent processes to achieve cooperative problem-solving behavior. As a computational problem, speech understanding reflects a large number of intrinsically interesting issues. Spoken sounds are achieved by a long chain of successive transformations, from intentions, through semantm and syntactic structurmg, to the eventually resulting audible acoustic waves. As a consequence, interpreting speech means effectively inverting these transformations to recover the speaker's intention from the sound. At each step in the interpretive process, ambiguity and uncertainty arise. The Hearsay-II problem-solving framework reconstructs an intention from hypothetmal interpretations formulated at various levels of abstraction. In additmn, it allocates hmlted processing resources fwst to the most promising incremental actions. The final configuration of the Hearsay-II system comprises problem-solving components to generate and evaluate speech hypotheses, and a focus-of-control mechanism to identify potentml actions of greatest value. Many of these specific procedures reveal novel approaches to speech problems. Most important, the system successfully integrates and coordinates all of these independent actlwhes to resolve uncertainty and control combmatorms. Several adaptations of the Hearsay-II framework have already been undertaken in other problem domains, and it is anticipated that this trend will contmue; many future systems necessarily will integrate diverse sources of knowledge to solve complex problems cooperatively. Discussed m this paper are the characteristics of the speech problem in particular, the specml kinds of problem-solving uncertainty in that domain, the structure of the Hearsay- II system developed to cope with that uncertamty, and the relationship between Hearsay- Irs structure and those of other speech-understanding systems. The paper is intended for the general computer science audience and presupposes no speech or artificial intelligence background.

This publication has 15 references indexed in Scilit: