Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions
- 9 December 2000
- journal article
- Published by Informa UK Limited in Human–Computer Interaction
- Vol. 15 (4), 263-322
- https://doi.org/10.1207/s15327051hci1504_1
Abstract
The growing interest in multimodal interface design is inspired in large part by the goals of supporting more transparent, flexible, efficient, and powerfully expressive means of human-computer interaction than in the past. Multimodal interfaces are expected to support a wider range of diverse applications, be usable by a broader spectrum of the average population, and function more reliably under realistic and challenging usage conditions. In this article, we summarize the emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner-including early and late fusion approaches, and the new hybrid symbolic-statistical approach. We also describe a diverse collection of state-of-the-art multimodal systems that process users' spoken and gestural input. These applications range from map-based and virtual reality systems for engaging in simulations and training, to field medic systems for mobile use in noisy environments, to web-based transactions and standard text-editing applications that will reshape daily computing and have a significant commercial impact. To realize successful multimodal systems of the future, many key research challenges remain to be addressed. Among these challenges are the development of cognitive theories to guide multimodal system design, and the development of effective natural language processing, dialogue processing, and error-handling techniques. In addition, new multimodal systems will be needed that can function more robustly and adaptively, and with support for collaborative multiperson use. Before this new class of systems can proliferate, toolkits also will be needed to promote software development for both simulated and functioning systems.Keywords
This publication has 72 references indexed in Scilit:
- Deictic believability: Coordinated gesture, locomotion, and speech in lifelike pedagogical agentsApplied Artificial Intelligence, 1999
- Employing ai methods to control the behavior of animated interface agentsApplied Artificial Intelligence, 1999
- Multimodal interaction for 2D and 3D environments [virtual reality]IEEE Computer Graphics and Applications, 1999
- The open agent architecture: A framework for building distributed software systemsApplied Artificial Intelligence, 1999
- A rational agent as the kernel of a cooperative spoken dialogue system: Implementing a logical theory of interactionLecture Notes in Computer Science, 1997
- Smart RoomsScientific American, 1996
- User-centered modeling for spoken language and multimodal interfacesIEEE MultiMedia, 1996
- Discriminative learning for minimum error classification (pattern recognition)IEEE Transactions on Signal Processing, 1992
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989
- Hearing lips and seeing voicesNature, 1976