Statistically Robust Evaluation of Stream-Based Recommender Systems

17 December 2019

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering

Vol. 33 (7), 2971-2982
https://doi.org/10.1109/tkde.2019.2960216

Abstract

Online incremental models for recommendation are nowadays pervasive in both the industry and the academia. However, there is not yet a standard evaluation methodology for the algorithms that maintain such models. Moreover, online evaluation methodologies available in the literature generally fall short on the statistical validation of results, since this validation is not trivially applicable to stream-based algorithms. We propose a k-fold validation framework for the pairwise comparison of recommendation algorithms that learn from user feedback streams, using prequential evaluation. Our proposal enables continuous statistical testing on adaptive-size sliding windows over the outcome of the prequential process, allowing practitioners and researchers to make decisions in real time based on solid statistical evidence. We present a set of experiments to gain insights on the sensitivity and robustness of two statistical tests - McNemar's and Wilcoxon signed rank - in a streaming data environment. Our results show that besides allowing a real-time, fine-grained online assessment, the online versions of the statistical tests are at least as robust as the batch versions, and definitely more robust than a simple prequential single-fold approach.

Keywords

Funding Information

Fundação para a Ciência e a Tecnologia (UID/EEA/50014/2019)

This publication has 29 references indexed in Scilit:

Efficient Online Evaluation of Big Data Stream Classifiers
Published by Association for Computing Machinery (ACM) ,2015
An overview on the exploitation of time in collaborative filtering
WIREs Data Mining and Knowledge Discovery, 2015
Forgetting methods for incremental matrix factorization in recommender systems
Published by Association for Computing Machinery (ACM) ,2015
Fast Incremental Matrix Factorization for Recommendation with Positive-Only Feedback
Published by Springer Science and Business Media LLC ,2014
Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols
User Modelling and User-Adapted Interaction, 2013
On evaluating stream learning algorithms
Machine Learning, 2012
Evaluating recommender systems from the user’s perspective: survey of the state of the art
User Modelling and User-Adapted Interaction, 2012
Adaptive Learning from Evolving Data Streams
Lecture Notes in Computer Science, 2009
Controlled experiments on the web: survey and practical guide
Data Mining and Knowledge Discovery, 2008
Evaluating collaborative filtering recommender systems
ACM Transactions on Information Systems, 2004

Cited by 7 articles