Predicting Chemical-Induced Liver Toxicity Using High-Content Imaging Phenotypes and Chemical Descriptors: A Random Forest Approach

Abstract
Hepatotoxicity is a major reason for the withdrawal or discontinuation of drugs from clinical trials. Thus, better tools are needed to filter potential hepatotoxic drugs early in drug discovery. Our study demonstrates utilization of HCI phenotypes, chemical descriptors, and both combined (hybrid) descriptors to construct random forest classifiers (RFCs) for the prediction of hepatotoxicity. HCI data published by Broad Institute provided HCI phenotypes for about 30 000 samples in multiple replicates. Phenotypes belonging to 346 chemicals, which were tested in up to eight replicates, were chosen as a basis for our analysis. We then constructed individual RFC models for HCI phenotypes, chemical descriptors, and hybrid (chemical and HCI) descriptors. The model that was constructed using selective hybrid descriptors showed high predictive performance with 5-fold cross validation (CV) balanced accuracy (BA) at 0.71, whereas within the given applicability domain (AD), independent test set and external test set prediction BAs were equal to 0.61 and 0.60, respectively. The model constructed using chemical descriptors showed a similar predictive performance with a 5-fold CV BA equal to 0.66, a test set prediction BA within the AD equal to 0.56, and an external test set prediction BA within the AD equal to 0.50. In conclusion, the hybrid and chemical descriptor-based models presented here should be considered as a new tool for filtering hepatotoxic molecules during compound prioritization in drug discovery.
Funding Information
  • Stiftelsen f?r Kunskaps- och Kompetensutveckling (dnr 202100-2924)
  • ?rebro Universitet
  • Environmental Forensic Laboratory, ?rebro University