Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

9 February 2011

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 297-306
https://doi.org/10.1145/1935826.1935878

Abstract

Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature. Common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating simulator itself is often difficult and modeling bias is usually unavoidably introduced. In this paper, we introduce a replay methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method.

Keywords

Other Versions

Version 2, 2010-03-31, preprints

This publication has 14 references indexed in Scilit:

Online learning for recency search ranking using real-time user feedback
Published by Association for Computing Machinery (ACM) ,2010
A contextual-bandit approach to personalized news article recommendation
Published by Association for Computing Machinery (ACM) ,2010
Explore/Exploit Schemes for Web Content Optimization
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Exploration scavenging
Published by Association for Computing Machinery (ACM) ,2008
Experience-efficient learning in associative bandit problems
Published by Association for Computing Machinery (ACM) ,2006
Bandit problems with side observations
IEEE Transactions on Automatic Control, 2005
Reinforcement Learning with Immediate Rewards and Linear Hypotheses
Algorithmica, 2003
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing, 2002
Asymptotically efficient adaptive allocation rules
Advances in Applied Mathematics, 1985
A One-Armed Bandit Problem with a Concomitant Variable
Journal of the American Statistical Association, 1979

Cited by 186 articles