XGBoost

Top Cited Papers

13 August 2016

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

https://doi.org/10.1145/2939672.2939785

Abstract

Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

Keywords

LARGE-SCALE MACHINE LEARNING

Other Versions

Funding Information

Office of Naval Research (N000141010672)
National Science Foundation (1258741)

This publication has 12 references indexed in Scilit:

Practical Lessons from Predicting Clicks on Ads at Facebook
Published by Association for Computing Machinery (ACM) ,2014
Learning Nonlinear Functions Using Regularized Greedy Forest
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013
Scaling up Machine Learning
Published by Cambridge University Press (CUP) ,2011
PLANET
Proceedings of the VLDB Endowment, 2009
A Fast Algorithm for Approximate Quantiles in High Speed Data Streams
19th International Conference on Scientific and Statistical Database Management (SSDBM 2007), 2007
Stochastic gradient boosting
Computational Statistics & Data Analysis, 2002
Greedy function approximation: A gradient boosting machine.
The Annals of Statistics, 2001
Space-efficient online computation of quantile summaries
Published by Association for Computing Machinery (ACM) ,2001
Random Forests
Machine Learning, 2001
Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)
The Annals of Statistics, 2000

Cited by 21584 articles