Tunable Machine Vision-Based Strategy for Automated Annotation of Chemical Databases
- 21 July 2009
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Modeling
- Vol. 49 (8), 1993-2001
- https://doi.org/10.1021/ci900029v
Abstract
We present a tunable, machine vision-based strategy for automated annotation of virtual small molecule databases. The proposed strategy is based on the use of a machine vision-based tool for extracting structure diagrams in research articles and converting them into connection tables, a virtual “Chemical Expert” system for screening the converted structures based on the adjustable levels of estimated conversion accuracy, and a fragment-based measure for calculating intermolecular similarity. For annotation, calculated chemical similarity between the converted structures and entries in a virtual small molecule database is used to establish the links. The overall annotation performances can be tuned by adjusting the cutoff threshold of the estimated conversion accuracy. We perform an annotation test which attempts to link 121 journal articles registered in PubMed to entries in PubChem which is the largest, publicly accessible chemical database. Two cases of tests are performed, and their results are compared to see how the overall annotation performances are affected by the different threshold levels of the estimated accuracy of the converted structure. Our work demonstrates that over 45% of the articles could have true positive links to entries in the PubChem database with promising recall and precision rates in both tests. Furthermore, we illustrate that the Chemical Expert system which can screen converted structures based on the adjustable levels of estimated conversion accuracy is a key factor impacting the overall annotation performance. We propose that this machine vision-based strategy can be incorporated with the text-mining approach to facilitate extraction of contextual scientific knowledge about a chemical structure, from the scientific literature.Keywords
This publication has 34 references indexed in Scilit:
- Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source SolutionJournal of Chemical Information and Modeling, 2009
- Automated extraction of chemical structure information from digital raster imagesChemistry Central Journal, 2009
- DrugBank: a knowledgebase for drugs, drug actions and drug targetsNucleic Acids Research, 2007
- Mining chemical structural information from the drug literatureDrug Discovery Today, 2006
- ChemDB: a public database of small molecules and related chemoinformatics resourcesBioinformatics, 2005
- Chemical space and biologyNature, 2004
- Chemical Machine Vision: Automated Extraction of Chemical Metadata from Raster ImagesJournal of Chemical Information and Computer Sciences, 2003
- Recent Advances in the CLiDE Project: Logical Layout Analysis of Chemical DocumentsJournal of Chemical Information and Computer Sciences, 1997
- Chemical literature data extraction: The CLiDE ProjectJournal of Chemical Information and Computer Sciences, 1993
- Kekule: OCR-optical chemical (structure) recognitionJournal of Chemical Information and Computer Sciences, 1992