Robustness of random forests for regression
- 13 September 2012
- journal article
- research article
- Published by Taylor & Francis Ltd in Journal of Nonparametric Statistics
- Vol. 24 (4), 993-1006
- https://doi.org/10.1080/10485252.2012.715161
Abstract
In this paper, we empirically investigate the robustness of random forests for regression problems. We also investigate the performance of six variations of the original random forest method, all aimed at improving robustness. These variations are based on three main ideas: (1) robustify the aggregation method, (2) robustify the splitting criterion and (3) taking a robust transformation of the response. More precisely, with the first idea, we use the median (or weighted median), instead of the mean, to combine the predictions from the individual trees. With the second idea, we use least-absolute deviations from the median, instead of least-squares, as splitting criterion. With the third idea, we build the trees using the ranks of the response instead of the original values. The competing methods are compared via a simulation study with artificial data using two different types of contaminations and also with 13 real data sets. Our results show that all three ideas improve the robustness of the original random forest algorithm. However, a robust aggregation of the individual trees is generally more profitable than a robust splitting criterion.Keywords
This publication has 10 references indexed in Scilit:
- Mining data with random forests: A survey and results of new testsPattern Recognition, 2011
- On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classificationJournal of Multivariate Analysis, 2010
- Alternative methods of predicting competitive events: An application in horserace betting marketsInternational Journal of Forecasting, 2010
- TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarraysBMC Bioinformatics, 2010
- The behaviour of random forest permutation-based variable importance measures under predictor correlationBMC Bioinformatics, 2010
- Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliographyComputational Statistics & Data Analysis, 2009
- Navigating Random Forests and related advances in algorithmic modelingStatistics Surveys, 2008
- Random forests as a tool for ecohydrological distribution modellingEcological Modelling, 2007
- An empirical comparison of ensemble methods based on classification treesJournal of Statistical Computation and Simulation, 2005
- Random ForestsMachine Learning, 2001