Knowledge-based derivation of document logical structure
- 19 November 2002
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in Proceedings of 3rd International Conference on Document Analysis and Recognition
Abstract
The analysis of a document image to derive a symbolic description of its structure and contents involves using spatial domain knowledge to classify the different printed blocks (e.g., text paragraphs), group them into logical units (e.g., newspaper stories), and determine the reading order of the text blocks within each unit. These steps describe the conversion of the physical structure of a document into its logical structure. We have developed a computational model for document logical structure derivation, in which a rule-based control strategy utilizes the data obtained from analyzing a digitized document image, and makes inferences using a multi-level knowledge base of document layout rules. The knowledge-based document logical structure derivation system (DeLoS) based on this model consists of a hierarchical rule-based control system to guide the block classification, grouping and read-ordering operations; a global data structure to store the document image data and incremental inferences; and a domain knowledge base to encode the rules governing document layout.Keywords
This publication has 3 references indexed in Scilit:
- Understanding multi-articled documentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A rule-based system for document image segmentationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The document spectrum for page layout analysisIEEE Transactions on Pattern Analysis and Machine Intelligence, 1993