Active Learning for Ranking through Expected Loss Optimization

30 October 2014

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering

Vol. 27 (5), 1180-1191
https://doi.org/10.1109/tkde.2014.2365785

Abstract

Learning to rank arises in many data mining applications, ranging from web search engine, online advertising to recommendation system. In learning to rank, the performance of a ranking model is strongly affected by the number of labeled examples in the training set; on the other hand, obtaining labeled examples for training data is very expensive and time-consuming. This presents a great need for the active learning approaches to select most informative examples for ranking learning; however, in the literature there is still very limited work to address active learning for ranking. In this paper, we propose a general active learning framework, expected loss optimization (ELO), for ranking. The ELO framework is applicable to a wide range of ranking functions. Under this framework, we derive a novel algorithm, expected discounted cumulative gain (DCG) loss optimization (ELO-DCG), to select most informative examples. Then, we investigate both query and document level active learning for raking and propose a two-stage ELO-DCG algorithm which incorporate both query and document selection into active learning. Furthermore, we show that it is flexible for the algorithm to deal with the skewed grade distribution problem with the modification of the loss function. Extensive experiments on real-world web search data sets have demonstrated great potential and effectiveness of the proposed framework and algorithms.

Keywords

Funding Information

National Natural Science Foundation of China (61003107, 61221001)
High Technology Research and Development Program of China (2012AA011702)

This publication has 21 references indexed in Scilit:

Active Learning to Rank using Pairwise Supervision
Published by Society for Industrial & Applied Mathematics (SIAM) ,2013
Learning to rank with SoftRank and Gaussian processes
Published by Association for Computing Machinery (ACM) ,2008
Optimizing estimated loss reduction for active sampling in rank learning
Published by Association for Computing Machinery (ACM) ,2008
AdaRank
Published by Association for Computing Machinery (ACM) ,2007
A regression framework for learning ranking functions using relative relevance judgments
Published by Association for Computing Machinery (ACM) ,2007
Magnitude-preserving ranking algorithms
Published by Association for Computing Machinery (ACM) ,2007
Bootstrap prediction and Bayesian prediction under misspecified models
Bernoulli, 2005
Greedy function approximation: A gradient boosting machine.
The Annals of Statistics, 2001
Selective Sampling Using the Query by Committee Algorithm
Machine Learning, 1997
A Sequential Algorithm for Training Text Classifiers
Published by Springer Science and Business Media LLC ,1994

Cited by 28 articles