Boundary-Aware Arbitrary-Shaped Scene Text Detector With Learnable Embedding Network

30 June 2021

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Multimedia

Vol. 24 (15209210), 3129-3143
https://doi.org/10.1109/tmm.2021.3093727

Abstract

Benefiting from the popularity of deep learning theory, scene text detection algorithms have developed rapidly in recent years. Methods representing text region by text segmentation map are proved to capture arbitrary-shaped text in a more flexible and accurate way. However, such segmentation-based methods are prone to be disturbed by the text-like background patterns (like the fence, grass, etc.), which generally suffer from imprecise boundary detail problem. In this paper, LEMNet is proposed to handle the imprecise boundary problem by guiding the generation of text boundary based on a priori constraint. In the training stage, Boundary Segmentation Branch is firstly constructed to predict coarse boundary mask for each text instance. Then, through mapping pixels into an embedding space, the proposed Pixel Embedding Branch makes the embedding representation of boundary points learn to be more similar, meanwhile enlarging the characteristic distance between background points and boundary points. During inference, noise in the coarse boundary segmentation map can be effectively suppressed by a Noisy Point Suppression Algorithm among pixel embedding vectors. In this way, LEMNet can generate a more precise boundary description of text regions. To further enhance the distinguishability of boundary features, we propose a Context Enhancement Module to capture feature interactions in different representation subspaces, in which features are parallelly performed attention and concatenated to generate enhanced features. Extensive experiments are conducted over four challenging datasets, which demonstrate the effectiveness of LEMNet. Specifically, LEMNet achieves F-measure of 85.2%, 87.6% and 85.2% on CTW1500, Total-Text and MSRA-TD500 respectively, which is the latest SOTA.

Keywords

Funding Information

National Natural Science Foundation of China (62022076, U1936210, 61972105)
Fundamental Research Funds for the Central Universities (WK3480000011)

This publication has 53 references indexed in Scilit:

Detecting Oriented Text in Natural Images by Linking Segments
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
EAST: An Efficient and Accurate Scene Text Detector
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
Semantic Instance Segmentation for Autonomous Driving
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
Detecting Text in Natural Image with Connectionist Text Proposal Network
Published by Springer Science and Business Media LLC ,2016
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016
Deep Residual Learning for Image Recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Synthetic Data for Text Localisation in Natural Images
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
ICDAR 2015 competition on Robust Reading
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Fully convolutional networks for semantic segmentation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
A Unified Framework for Multioriented Text Detection and Recognition
IEEE Transactions on Image Processing, 2014

Cited by 5 articles