A Novel Consistent Random Forest Framework: Bernoulli Random Forests

15 August 2017

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Neural Networks and Learning Systems

Vol. 29 (8), 3510-3523
https://doi.org/10.1109/tnnls.2017.2729778

Abstract

Random forests (RFs) are recognized as one type of ensemble learning method and are effective for the most classification and regression tasks. Despite their impressive empirical performance, the theory of RFs has yet been fully proved. Several theoretically guaranteed RF variants have been presented, but their poor practical performance has been criticized. In this paper, a novel RF framework is proposed, named Bernoulli RFs (BRFs), with the aim of solving the RF dilemma between theoretical consistency and empirical performance. BRF uses two independent Bernoulli distributions to simplify the tree construction, in contrast to the RFs proposed by Breiman. The two Bernoulli distributions are separately used to control the splitting feature and splitting point selection processes of tree construction. Consequently, theoretical consistency is ensured in BRF, i.e., the convergence of learning performance to optimum will be guaranteed when infinite data are given. Importantly, our proposed BRF is consistent for both classification and regression. The best empirical performance is achieved by BRF when it is compared with state-of-the-art theoretical/consistent RFs. This advance in RF research toward closing the gap between theory and practice is verified by the theoretical and experimental studies in this paper.

Funding Information

National Natural Science Foundation of China (61371078, 61375054)
R&D Program of Shenzhen (JCYJ20140509172959977, JSGG20150512162853495, ZDSYS20140509172959989, JCYJ20160331184440545)
Australian Research Council (DP140100545, DP140102206)
Program for Professor of Special Appointment (Eastern Scholar) at the Shanghai Institutions of Higher Learning

This publication has 29 references indexed in Scilit:

Mining data with random forests: A survey and results of new tests
Pattern Recognition, 2011
On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification
Journal of Multivariate Analysis, 2010
Consistency of random survival forests
Statistics & Probability Letters, 2010
SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
Nature Genetics, 2008
Random Forests and Adaptive Nearest Neighbors
Journal of the American Statistical Association, 2006
Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction
Ecosystems, 2006
Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling
Journal of Chemical Information and Computer Sciences, 2003
The random subspace method for constructing decision forests
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998
Shape Quantization and Recognition with Randomized Trees
Neural Computation, 1997
Bonferroni Inequalities
The Annals of Probability, 1977

Cited by 56 articles