Multilingual Artificial Text Detection Using a Cascade of Transforms

Abstract
This paper presents a method for multilingual artificial text detection and extraction from still images. The proposed detection scheme relies on a cascade of spatial transforms followed by a box counting based fractal dimension approach to exploit the self-similar redundancy of patterns in the shapes of characters in the text. The detected text regions are validated using GLCM based features and are segmented from the background using the proposed binarization scheme. The proposed method is evaluated on five data sets containing textual occurrences in Urdu, English, Chinese, Arabic and Hindi. The experimental results realized show very promising precision and recall rates which are also consistent across different data sets.

This publication has 23 references indexed in Scilit: