Comparing Fingerprints for Ligand-Based Virtual Screening: A Fast and Scalable Approach for Unbiased Evaluation
- 26 October 2020
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Modeling
- Vol. 60 (10), 4536-4545
- https://doi.org/10.1021/acs.jcim.0c00469
Abstract
Ligand-based virtual screening is a useful tool for drug and probe discovery due to its high accessibility and scalability. The recent identification of bias in many data sets that were used in performance evaluation, quantified by the asymmetric validation embedding (AVE) score, has prompted the reanalysis of models to determine which performs best. Based on the understanding that ligand data are made up of blocks of highly correlated instances, we introduce a technique that quickly generates splits with AVE distributed close to zero using a combination of clustering and removal of the most biased data. We used our technique to compare the performance of the Morgan and CATS fingerprints and show that, after debiasing, the implementation of the CATS fingerprint performs significantly better. The code to replicate these results and perform low-bias splits is available at https://github.com/ljmartin/fp_low_ave.Funding Information
- University of Sydney
- National Health and Medical Research Council (1092046)
This publication has 33 references indexed in Scilit:
- Open-source platform to benchmark fingerprints for ligand-based virtual screeningJournal of Cheminformatics, 2013
- Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction.Journal of Chemical Information and Modeling, 2013
- Chemically Advanced Template Search (CATS) for Scaffold‐Hopping and Prospective Target Prediction for ‘Orphan’ MoleculesMolecular Informatics, 2013
- Scaffold Hopping Using Two-Dimensional Fingerprints: True Potential, Black Magic, or a Hopeless Endeavor? Guidelines for Virtual ScreeningJournal of Medicinal Chemistry, 2010
- Extended-Connectivity FingerprintsJournal of Chemical Information and Modeling, 2010
- The Pascal Visual Object Classes (VOC) ChallengeInternational Journal of Computer Vision, 2009
- Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrapComputational Statistics & Data Analysis, 2009
- Scaffold‐Hopping: How Far Can You Jump?QSAR & Combinatorial Science, 2006
- On Outliers and Activity CliffsWhy QSAR Often DisappointsJournal of Chemical Information and Modeling, 2006
- The Role of Exchangeability in InferenceThe Annals of Statistics, 1981