Setting up a large set of protein-ligand PDB complexes for the development and validation of knowledge-based docking algorithms

Open Access

25 August 2007

journal article
research article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 8 (1), 310
https://doi.org/10.1186/1471-2105-8-310

Abstract

Background: The number of algorithms available to predict ligand-protein interactions is large and ever-increasing. The number of test cases used to validate these methods is usually small and problem dependent. Recently, several databases have been released for further understanding of protein-ligand interactions, having the Protein Data Bank as backend support. Nevertheless, it appears to be difficult to test docking methods on a large variety of complexes. In this paper we report the development of a new database of protein-ligand complexes tailored for testing of docking algorithms. Methods: Using a new definition of molecular contact, small ligands contained in the 2005 PDB edition were identified and processed. The database was enriched in molecular properties. In particular, an automated typing of ligand atoms was performed. A filtering procedure was applied to select a non-redundant dataset of complexes. Data mining was performed to obtain information on the frequencies of different types of atomic contacts. Docking simulations were run with the program DOCK. Results: We compiled a large database of small ligand-protein complexes, enriched with different calculated properties, that currently contains more than 6000 non-redundant structures. As an example to demonstrate the value of the new database, we derived a new set of chemical matching rules to be used in the context of the program DOCK, based on contact frequencies between ligand atoms and points representing the protein surface, and proved their enhanced efficiency with respect to the default set of rules included in that program. Conclusion: The new database constitutes a valuable resource for the development of knowledge-based docking algorithms and for testing docking programs on large sets of protein-ligand complexes. The new chemical matching rules proposed in this work significantly increase the success rate in DOCKing simulations. The database developed in this work is available at http://cimlcsext.cim.sld.cu:8080/screeningbrowser/.

Keywords

This publication has 30 references indexed in Scilit:

Prediction of Protein−Ligand Interactions. Docking and Scoring: Successes and Gaps
Journal of Medicinal Chemistry, 2006
Comparison of protein active site structures for functional annotation of proteins and drug design
Proteins-Structure Function and Bioinformatics, 2006
Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein–ligand complexes
Bioinformatics, 2003
The Protein Data Bank
Nucleic Acids Research, 2000
Development and validation of a genetic algorithm for flexible docking
Journal of Molecular Biology, 1997
Automatic assignment of chemical connectivity to organic molecules in the Cambridge Structural Database
Journal of Chemical Information and Computer Sciences, 1992
Determination of molecular topology and atomic hybridization states from heavy atom coordinates
Journal of Computational Chemistry, 1991
A new force field for molecular mechanical simulation of nucleic acids and proteins
Journal of the American Chemical Society, 1984
Solvent-Accessible Surfaces of Proteins and Nucleic Acids
Science, 1983
A geometric approach to macromolecule-ligand interactions
Journal of Molecular Biology, 1982

Cited by 10 articles