Chemical Name to Structure: OPSIN, an Open Source Solution
- 9 March 2011
- journal article
- review article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Modeling
- Vol. 51 (3), 739-753
- https://doi.org/10.1021/ci100384d
Abstract
We have produced an open source, freely available, algorithm (Open Parser for Systematic IUPAC Nomenclature, OPSIN) that interprets the majority of organic chemical nomenclature in a fast and precise manner. This has been achieved using an approach based on a regular grammar. This grammar is used to guide tokenization, a potentially difficult problem in chemical names. From the parsed chemical name, an XML parse tree is constructed that is operated on in a stepwise manner until the structure has been reconstructed from the name. Results from OPSIN on various computer generated name/structure pair sets are presented. These show exceptionally high precision (99.8%+) and, when using general organic chemical nomenclature, high recall (98.7−99.2%). This software can serve as the basis for future open source developments of chemical name interpretation.Keywords
This publication has 21 references indexed in Scilit:
- ChEBI: a database and ontology for chemical entities of biological interestNucleic Acids Research, 2007
- Mining chemical structural information from the drug literatureDrug Discovery Today, 2006
- High-Throughput Identification of Chemistry in Life Science TextsLecture Notes in Computer Science, 2006
- Chemical documents: machine understanding and automated information extractionOrganic & Biomolecular Chemistry, 2004
- Experimental data checker: better information for organic chemistsOrganic & Biomolecular Chemistry, 2004
- Name=Struct: A Practical Approach to the Sorry State of Real-Life Chemical NomenclatureJournal of Chemical Information and Computer Sciences, 1999
- Computer translation of IUPAC systematic organic chemical nomenclature. 2. Development of a formal grammarJournal of Chemical Information and Computer Sciences, 1989
- Computer translation of IUPAC systematic organic chemical nomenclature. 1. Introduction and background to a grammar-based approachJournal of Chemical Information and Computer Sciences, 1989
- An Algorithm for Translating Chemical Names to Molecular Formulas.Journal of Chemical Documentation, 1962
- Three models for the description of languageIEEE Transactions on Information Theory, 1956