The rcdk and cluster R packages applied to drug candidate selection

Open Access

20 January 2020

journal article
research article
Published by Springer Science and Business Media LLC in Journal of Cheminformatics

Vol. 12 (1), 1-8
https://doi.org/10.1186/s13321-019-0405-0

Abstract

The aim of this article is to show how thevpower of statistics and cheminformatics can be combined, in R, using two packages: rcdk and cluster. We describe the role of clustering methods for identifying similar structures in a group of 23 molecules according to their fingerprints. The most commonly used method is to group the molecules using a “score” obtained by measuring the average distance between them. This score reflects the similarity/non-similarity between compounds and helps us identify active or potentially toxic substances through predictive studies. Clustering is the process by which the common characteristics of a particular class of compounds are identified. For clustering applications, we are generally measure the molecular fingerprint similarity with the Tanimoto coefficient. Based on the molecular fingerprints, we calculated the molecular distances between the methotrexate molecule and the other 23 molecules in the group, and organized them into a matrix. According to the molecular distances and Ward ’s method, the molecules were grouped into 3 clusters. We can presume structural similarity between the compounds and their locations in the cluster map. Because only 5 molecules were included in the methotrexate cluster, we considered that they might have similar properties and might be further tested as potential drug candidates.

Keywords

This publication has 43 references indexed in Scilit:

KNIME-CDK: Workflow-driven cheminformatics
BMC Bioinformatics, 2013
An extensive comparative study of cluster validity indices
Pattern Recognition, 2013
Voting-based consensus clustering for combining multiple clusterings of chemical structures
Journal of Cheminformatics, 2012
Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI
Journal of Cheminformatics, 2012
Adaptation of High-Throughput Screening in Drug Discovery—Toxicological Screening Tests
International Journal of Molecular Sciences, 2011
Open Babel: An open chemical toolbox
Journal of Cheminformatics, 2011
ChemMine tools: an online service for analyzing and clustering small molecules
Nucleic Acids Research, 2011
Principles of early drug discovery
British Journal of Pharmacology, 2011
Cheminformatic Tools for Medicinal Chemists
Journal of Medicinal Chemistry, 2010
The price of innovation: new estimates of drug development costs
Journal of Health Economics, 2003

Cited by 32 articles