Chemical documents: machine understanding and automated information extraction

20 October 2004

journal article
Published by Royal Society of Chemistry (RSC) in Organic & Biomolecular Chemistry

Vol. 2 (22), 3294-3300
https://doi.org/10.1039/b411033a

Abstract

Automatically extracting chemical information from documents is a challenging task, but an essential one for dealing with the vast quantity of data that is available. The task is least difficult for structured documents, such as chemistry department web pages or the output of computational chemistry programs, but requires increasingly sophisticated approaches for less structured documents, such as chemical papers. The identification of key units of information, such as chemical names, makes the extraction of useful information from unstructured documents possible.

Cited by 19 articles