Setting up a large set of protein-ligand PDB complexes for the development and validation of knowledge-based docking algorithms
Open Access
- 25 August 2007
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 8 (1), 310
- https://doi.org/10.1186/1471-2105-8-310
Abstract
Background: The number of algorithms available to predict ligand-protein interactions is large and ever-increasing. The number of test cases used to validate these methods is usually small and problem dependent. Recently, several databases have been released for further understanding of protein-ligand interactions, having the Protein Data Bank as backend support. Nevertheless, it appears to be difficult to test docking methods on a large variety of complexes. In this paper we report the development of a new database of protein-ligand complexes tailored for testing of docking algorithms. Methods: Using a new definition of molecular contact, small ligands contained in the 2005 PDB edition were identified and processed. The database was enriched in molecular properties. In particular, an automated typing of ligand atoms was performed. A filtering procedure was applied to select a non-redundant dataset of complexes. Data mining was performed to obtain information on the frequencies of different types of atomic contacts. Docking simulations were run with the program DOCK. Results: We compiled a large database of small ligand-protein complexes, enriched with different calculated properties, that currently contains more than 6000 non-redundant structures. As an example to demonstrate the value of the new database, we derived a new set of chemical matching rules to be used in the context of the program DOCK, based on contact frequencies between ligand atoms and points representing the protein surface, and proved their enhanced efficiency with respect to the default set of rules included in that program. Conclusion: The new database constitutes a valuable resource for the development of knowledge-based docking algorithms and for testing docking programs on large sets of protein-ligand complexes. The new chemical matching rules proposed in this work significantly increase the success rate in DOCKing simulations. The database developed in this work is available at http://cimlcsext.cim.sld.cu:8080/screeningbrowser/.Keywords
This publication has 30 references indexed in Scilit:
- Prediction of Protein−Ligand Interactions. Docking and Scoring: Successes and GapsJournal of Medicinal Chemistry, 2006
- Comparison of protein active site structures for functional annotation of proteins and drug designProteins-Structure Function and Bioinformatics, 2006
- Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein–ligand complexesBioinformatics, 2003
- The Protein Data BankNucleic Acids Research, 2000
- Development and validation of a genetic algorithm for flexible dockingJournal of Molecular Biology, 1997
- Automatic assignment of chemical connectivity to organic molecules in the Cambridge Structural DatabaseJournal of Chemical Information and Computer Sciences, 1992
- Determination of molecular topology and atomic hybridization states from heavy atom coordinatesJournal of Computational Chemistry, 1991
- A new force field for molecular mechanical simulation of nucleic acids and proteinsJournal of the American Chemical Society, 1984
- Solvent-Accessible Surfaces of Proteins and Nucleic AcidsScience, 1983
- A geometric approach to macromolecule-ligand interactionsJournal of Molecular Biology, 1982