A survey on metrics for the evaluation of user simulations
- 28 November 2012
- journal article
- research article
- Published by Cambridge University Press (CUP) in The Knowledge Engineering Review
- Vol. 28 (1), 59-73
- https://doi.org/10.1017/s0269888912000343
Abstract
User simulation is an important research area in the field of spoken dialogue systems (SDSs) because collecting and annotating real human–machine interactions is often expensive and time-consuming. However, such data are generally required for designing, training and assessing dialogue systems. User simulations are especially needed when using machine learning methods for optimizing dialogue management strategies such as Reinforcement Learning, where the amount of data necessary for training is larger than existing corpora. The quality of the user simulation is therefore of crucial importance because it dramatically influences the results in terms of SDS performance analysis and the learnt strategy. Assessment of the quality of simulated dialogues and user simulation methods is an open issue and, although assessment metrics are required, there is no commonly adopted metric. In this paper, we give a survey of User Simulations Metrics in the literature, propose some extensions and discuss these metrics in terms of a list of desired features.Keywords
This publication has 29 references indexed in Scilit:
- Recent research advances in Reinforcement Learning in Spoken Dialogue SystemsThe Knowledge Engineering Review, 2009
- Data-driven user simulation for automated evaluation of spoken dialog systemsComputer Speech & Language, 2009
- Evaluating user simulations with the Cramér–von Mises divergenceSpeech Communication, 2008
- Automating spoken dialogue management design using machine learning: An industry perspectiveSpeech Communication, 2008
- Partially observable Markov decision processes for spoken dialog systemsComputer Speech & Language, 2007
- A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategiesThe Knowledge Engineering Review, 2006
- Assessment of dialogue systems by means of a new simulation techniqueSpeech Communication, 2003
- A stochastic model of human-machine interaction for learning dialog strategiesIEEE Transactions on Speech and Audio Processing, 2000
- On the Distribution of the Two-Sample Cramer-von Mises CriterionThe Annals of Mathematical Statistics, 1962
- On Information and SufficiencyThe Annals of Mathematical Statistics, 1951