Survey on evaluation methods for dialogue systems
Open Access
- 25 June 2020
- journal article
- research article
- Published by Springer Science and Business Media LLC in Artificial Intelligence Review
- Vol. 54 (1), 755-810
- https://doi.org/10.1007/s10462-020-09866-x
Abstract
In this paper, we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation, in and of itself, is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost- and time-intensive. Thus, much work has been put into finding methods which allow a reduction in involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented, conversational, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then present the evaluation methods regarding that class.Keywords
Funding Information
- CHIST-ERA
- Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (20CH21_174237)
- Agencia Estatal de Investigación (PCIN-2017-118/AEI)
- Agencia Estatal de Investigación (PCIN-2017-085/AEI)
- Agence Nationale de la Recherche (ANR-17-CHR2-0001-03)
This publication has 122 references indexed in Scilit:
- A survey on metrics for the evaluation of user simulationsThe Knowledge Engineering Review, 2012
- The Meteor metric for automatic evaluation of machine translationMachine Translation, 2009
- Does this list contain what you were searching for? Learning adaptive dialogue strategies for interactive question answeringNatural Language Engineering, 2009
- A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategiesThe Knowledge Engineering Review, 2006
- Towards developing general models of usability with PARADISENatural Language Engineering, 2000
- 100 million words of EnglishEnglish Today, 1993
- Measuring nominal scale agreement among many raters.Psychological Bulletin, 1971
- Speech Acts: An Essay in the Philosophy of LanguageLanguage, 1970
- How to Do Things with WordsAnalysis, 1963
- I.—COMPUTING MACHINERY AND INTELLIGENCEMind, 1950