A Novel Consistent Random Forest Framework: Bernoulli Random Forests

Abstract
Random forests (RFs) are recognized as one type of ensemble learning method and are effective for the most classification and regression tasks. Despite their impressive empirical performance, the theory of RFs has yet been fully proved. Several theoretically guaranteed RF variants have been presented, but their poor practical performance has been criticized. In this paper, a novel RF framework is proposed, named Bernoulli RFs (BRFs), with the aim of solving the RF dilemma between theoretical consistency and empirical performance. BRF uses two independent Bernoulli distributions to simplify the tree construction, in contrast to the RFs proposed by Breiman. The two Bernoulli distributions are separately used to control the splitting feature and splitting point selection processes of tree construction. Consequently, theoretical consistency is ensured in BRF, i.e., the convergence of learning performance to optimum will be guaranteed when infinite data are given. Importantly, our proposed BRF is consistent for both classification and regression. The best empirical performance is achieved by BRF when it is compared with state-of-the-art theoretical/consistent RFs. This advance in RF research toward closing the gap between theory and practice is verified by the theoretical and experimental studies in this paper.
Funding Information
  • National Natural Science Foundation of China (61371078, 61375054)
  • R&D Program of Shenzhen (JCYJ20140509172959977, JSGG20150512162853495, ZDSYS20140509172959989, JCYJ20160331184440545)
  • Australian Research Council (DP140100545, DP140102206)
  • Program for Professor of Special Appointment (Eastern Scholar) at the Shanghai Institutions of Higher Learning