Abstract
We present a new database for off-line handwriting recognition, together with a few preprocessing and text segmentation procedures. The database is based on the Lancaster-Oslo/Bergen(LOB) corpus. This corpus is a collection of tests that were used to generate forms, which subsequently were filled out by persons in their own handwriting. As of December 1998 the database includes 556 forms produced by approximately 250 different writers. The database consists of full English sentences. It could serve as a basis for a variety of handwriting recognition tasks. The main focus, however is on recognition techniques that use linguistic knowledge beyond the lexicon level. This knowledge can be automatically derived from the corpus or it can be supplied from external sources.

This publication has 11 references indexed in Scilit: