BDT

13 August 2017

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 1893-1901
https://doi.org/10.1145/3097983.3098175

Abstract

In this paper we present gradient boosted decision tables (BDTs). A d-dimensional decision table is essentially a mapping from a sequence of d boolean tests to a real value in {R}. We propose novel algorithms to fit decision tables. Our thorough empirical study suggests that decision tables are better weak learners in the gradient boosting framework and can improve the accuracy of the boosted ensemble. In addition, we develop an efficient data structure to represent decision tables and propose a novel fast algorithm to improve the scoring efficiency for boosted ensemble of decision tables. Experiments on public classification and regression datasets demonstrate that our method is able to achieve 1.5x to 6x speedups over the boosted regression trees baseline. We complement our experimental evaluation with a bias-variance analysis that explains how different weak models influence the predictive power of the boosted ensemble. Our experiments suggest gradient boosting with randomly backfitted decision tables distinguishes itself as the most accurate method on a number of classification and regression problems. We have deployed a BDT model to LinkedIn news feed system and achieved significant lift on key metrics.

Keywords

This publication has 15 references indexed in Scilit:

Quality versus efficiency in document scoring with learning-to-rank models
Information Processing & Management, 2016
QuickScorer
Published by Association for Computing Machinery (ACM) ,2015
Accurate intelligible models with pairwise interactions
Published by Association for Computing Machinery (ACM) ,2013
An empirical comparison of supervised learning algorithms
Published by Association for Computing Machinery (ACM) ,2006
Stochastic gradient boosting
Computational Statistics & Data Analysis, 2002
Greedy function approximation: A gradient boosting machine.
The Annals of Statistics, 2001
Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)
The Annals of Statistics, 2000
Sparse spatial autoregressions
Statistics & Probability Letters, 1997
Bagging predictors
Machine Learning, 1996
Neural Networks and the Bias/Variance Dilemma
Neural Computation, 1992

Cited by 17 articles