Random Interaction Forest (RIF)–A Novel Machine Learning Strategy Accounting for Feature Interaction
Open Access
- 29 December 2022
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Access
- Vol. 11, 1806-1813
- https://doi.org/10.1109/access.2022.3233194
Abstract
If an interaction exists in medical and health sciences, a proper statistical approach is required to avoid an erroneous conclusion. For example, different genders may introduce modified therapeutic effects of drugs, or an adverse interaction between two medicines changes the pharmacological activity, reduces the therapeutic effect, or induces toxicity. Therefore, if the analysis does not account for the impact of the interaction, it may introduce significant prediction errors or bias. Regression models deal with a two-way interaction by adding the product of the two interactive variables. Since machine learning models demonstrate a superior predictive ability to regression models, this study proposes a new method based on the random forest to account for interaction, called random interaction forest (RIF). This new strategy modifies the structure of the random forest, where the interaction features are forced to be in the first two nodes. Simulation studies examined the predictive ability of the linear regression model, logistic regression model, random forest, and the RIF under various scenarios. The results showed that the RIF consistently outperforms random forest and logistic regression when interactions are present. The RIF also performs better in many scenarios than the linear regression model. When the effect of interaction is more significant, the performance of RIF could be superior.Funding Information
- National Science and Technology Council (111-2118-M-A49-005)
This publication has 26 references indexed in Scilit:
- Do little interactions get lost in dark random forests?BMC Bioinformatics, 2016
- MissForest—non-parametric missing value imputation for mixed-type dataBioinformatics, 2011
- SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivationNature Genetics, 2008
- Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve AnalysisJournal of Educational and Behavioral Statistics, 2006
- Ensemble based systems in decision makingIEEE Circuits and Systems Magazine, 2006
- Random decision forestsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Random ForestsMachine Learning, 2001
- Bagging predictorsMachine Learning, 1996
- Support-vector networksMachine Learning, 1995
- Testing for Interaction in Multiple RegressionAmerican Journal of Sociology, 1977