Open-source platform to benchmark fingerprints for ligand-based virtual screening
Open Access
- 30 May 2013
- journal article
- Published by Springer Science and Business Media LLC in Journal of Cheminformatics
- Vol. 5 (1), 26
- https://doi.org/10.1186/1758-2946-5-26
Abstract
Similarity-search methods using molecular fingerprints are an important tool for ligand-based virtual screening. A huge variety of fingerprints exist and their performance, usually assessed in retrospective benchmarking studies using data sets with known actives and known or assumed inactives, depends largely on the validation data sets used and the similarity measure used. Comparing new methods to existing ones in any systematic way is rather difficult due to the lack of standard data sets and evaluation procedures. Here, we present a standard platform for the benchmarking of 2D fingerprints. The open-source platform contains all source code, structural data for the actives and inactives used (drawn from three publicly available collections of data sets), and lists of randomly selected query molecules to be used for statistically valid comparisons of methods. This allows the exact reproduction and comparison of results for future studies. The results for 12 standard fingerprints together with two simple baseline fingerprints assessed by seven evaluation methods are shown together with the correlations between methods. High correlations were found between the 12 fingerprints and a careful statistical analysis showed that only the two baseline fingerprints were different from the others in a statistically significant way. High correlations were also found between six of the seven evaluation methods, indicating that despite their seeming differences, many of these methods are similar to each other.Keywords
This publication has 44 references indexed in Scilit:
- Activity Landscape Representations for Structure−Activity Relationship AnalysisJournal of Medicinal Chemistry, 2010
- How similar are those molecules after all? Use two descriptors and you will have three different answersExpert Opinion on Drug Discovery, 2010
- Clustering files of chemical structures using the Székely–Rizzo generalization of Ward's methodJournal of Molecular Graphics and Modelling, 2009
- Better than Random? The Chemotype Enrichment ProblemJournal of Chemical Information and Modeling, 2009
- Recommendations for evaluation of computational methodsJournal of Computer-Aided Molecular Design, 2008
- Evaluating Virtual Screening Methods: Good and Bad Metrics for the “Early Recognition” ProblemJournal of Chemical Information and Modeling, 2007
- There is no such thing as ‘diversity’!Current Opinion in Chemical Biology, 2005
- Molecular similarity: a key technique in molecular informaticsOrganic & Biomolecular Chemistry, 2004
- Why do we need so many chemical similarity search methods?Drug Discovery Today, 2002
- Handbook of Molecular DescriptorsPublished by Wiley ,2000