A meta-analysis of research in random forests for classification

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE) in 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech)

Abstract

Since their introduction, random forests (RFs) have successfully been employed in a vast array of application areas. Fairly recently, a number of algorithms that are related to Breiman's original Forest-RI algorithm have been proposed in the literature. In this paper we conduct a meta-analysis of all (34) 2001-2015 papers that could be found in which a novel RF algorithm was proposed and compared to already established RF algorithms. The analysis revealed several limitations regarding the choice of performance measures, the way in which these measures are estimated, and the methodology for comparisons of multiple algorithms over multiple data sets. In fact, it is shown that in almost a third of the results from RF research papers, a significant improvement over the performance of Forest-RI is not found when comparisons are made using appropriate statistical tests.

Keywords

This publication has 33 references indexed in Scilit:

Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power
Information Sciences, 2010
Cross-validation and bootstrapping are unreliable in small sample classification
Pattern Recognition Letters, 2008
Shape Quantization and Recognition with Randomized Trees
Neural Computation, 1997
The use of the area under the ROC curve in the evaluation of machine learning algorithms
Pattern Recognition, 1997
Improvements on Cross-Validation: The 632+ Bootstrap Method
Journal of the American Statistical Association, 1997
On a Monotonicity Problem in Step-Down Multiple Test Procedures
Journal of the American Statistical Association, 1993
Modified Sequentially Rejective Multiple Test Procedures
Journal of the American Statistical Association, 1986
Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation
Journal of the American Statistical Association, 1983
Using Weighted Rankings in the Analysis of Complete Blocks with Additive Block Effects
Journal of the American Statistical Association, 1979
Estimation of the Medians for Dependent Variables
The Annals of Mathematical Statistics, 1959

Cited by 16 articles