Abstract
If an interaction exists in medical and health sciences, a proper statistical approach is required to avoid an erroneous conclusion. For example, different genders may introduce modified therapeutic effects of drugs, or an adverse interaction between two medicines changes the pharmacological activity, reduces the therapeutic effect, or induces toxicity. Therefore, if the analysis does not account for the impact of the interaction, it may introduce significant prediction errors or bias. Regression models deal with a two-way interaction by adding the product of the two interactive variables. Since machine learning models demonstrate a superior predictive ability to regression models, this study proposes a new method based on the random forest to account for interaction, called random interaction forest (RIF). This new strategy modifies the structure of the random forest, where the interaction features are forced to be in the first two nodes. Simulation studies examined the predictive ability of the linear regression model, logistic regression model, random forest, and the RIF under various scenarios. The results showed that the RIF consistently outperforms random forest and logistic regression when interactions are present. The RIF also performs better in many scenarios than the linear regression model. When the effect of interaction is more significant, the performance of RIF could be superior.
Funding Information
  • National Science and Technology Council (111-2118-M-A49-005)

This publication has 26 references indexed in Scilit: