XGBoost
Top Cited Papers
- 13 August 2016
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.Keywords
Other Versions
Funding Information
- Office of Naval Research (N000141010672)
- National Science Foundation (1258741)
This publication has 12 references indexed in Scilit:
- Practical Lessons from Predicting Clicks on Ads at FacebookPublished by Association for Computing Machinery (ACM) ,2014
- Learning Nonlinear Functions Using Regularized Greedy ForestIEEE Transactions on Pattern Analysis and Machine Intelligence, 2013
- Scaling up Machine LearningPublished by Cambridge University Press (CUP) ,2011
- PLANETProceedings of the VLDB Endowment, 2009
- A Fast Algorithm for Approximate Quantiles in High Speed Data Streams19th International Conference on Scientific and Statistical Database Management (SSDBM 2007), 2007
- Stochastic gradient boostingComputational Statistics & Data Analysis, 2002
- Greedy function approximation: A gradient boosting machine.The Annals of Statistics, 2001
- Space-efficient online computation of quantile summariesPublished by Association for Computing Machinery (ACM) ,2001
- Random ForestsMachine Learning, 2001
- Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)The Annals of Statistics, 2000