Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms
- 9 February 2011
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 297-306
- https://doi.org/10.1145/1935826.1935878
Abstract
Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature. Common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating simulator itself is often difficult and modeling bias is usually unavoidably introduced. In this paper, we introduce a replay methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method.Keywords
Other Versions
This publication has 14 references indexed in Scilit:
- Online learning for recency search ranking using real-time user feedbackPublished by Association for Computing Machinery (ACM) ,2010
- A contextual-bandit approach to personalized news article recommendationPublished by Association for Computing Machinery (ACM) ,2010
- Explore/Exploit Schemes for Web Content OptimizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Exploration scavengingPublished by Association for Computing Machinery (ACM) ,2008
- Experience-efficient learning in associative bandit problemsPublished by Association for Computing Machinery (ACM) ,2006
- Bandit problems with side observationsIEEE Transactions on Automatic Control, 2005
- Reinforcement Learning with Immediate Rewards and Linear HypothesesAlgorithmica, 2003
- The Nonstochastic Multiarmed Bandit ProblemSIAM Journal on Computing, 2002
- Asymptotically efficient adaptive allocation rulesAdvances in Applied Mathematics, 1985
- A One-Armed Bandit Problem with a Concomitant VariableJournal of the American Statistical Association, 1979