Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker
- 1 September 2020
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Biotechnology
- Vol. 38 (9), 1087-+
- https://doi.org/10.1038/s41587-020-0502-7
Abstract
Small molecules are usually compared by their chemical structure, but there is no unified analytic framework for representing and comparing their biological activity. We present the Chemical Checker (CC), which provides processed, harmonized and integrated bioactivity data on similar to 800,000 small molecules. The CC divides data into five levels of increasing complexity, from the chemical properties of compounds to their clinical outcomes. In between, it includes targets, off-targets, networks and cell-level information, such as omics data, growth inhibition and morphology. Bioactivity data are expressed in a vector format, extending the concept of chemical similarity to similarity between bioactivity signatures. We show how CC signatures can aid drug discovery tasks, including target identification and library characterization. We also demonstrate the discovery of compounds that reverse and mimic biological signatures of disease models and genetic perturbations in cases that could not be addressed using chemical information alone. Overall, the CC signatures facilitate the conversion of bioactivity data to a format that is readily amenable to machine learning methods. The biological activities of >800,000 small molecules are represented within a uniform framework.This publication has 94 references indexed in Scilit:
- Systematic identification of proteins that elicit drug side effectsMolecular Systems Biology, 2013
- The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivityNature, 2012
- Quantifying the chemical beauty of drugsNature Chemistry, 2012
- WikiPathways: building research communities on biological pathwaysNucleic Acids Research, 2011
- Multiple imputation by chained equations: what is it and how does it work?International Journal of Methods in Psychiatric Research, 2011
- MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence scoreNucleic Acids Research, 2010
- Pathway Commons, a web resource for biological pathway dataNucleic Acids Research, 2010
- Privileged scaffolds for library design and drug discoveryCurrent Opinion in Chemical Biology, 2010
- Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's diseaseNature Genetics, 2009
- Absolute enrichment: gene set enrichment analysis for homeostatic systemsNucleic Acids Research, 2006