Boundary-Aware Arbitrary-Shaped Scene Text Detector With Learnable Embedding Network
- 30 June 2021
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Multimedia
- Vol. 24 (15209210), 3129-3143
- https://doi.org/10.1109/tmm.2021.3093727
Abstract
Benefiting from the popularity of deep learning theory, scene text detection algorithms have developed rapidly in recent years. Methods representing text region by text segmentation map are proved to capture arbitrary-shaped text in a more flexible and accurate way. However, such segmentation-based methods are prone to be disturbed by the text-like background patterns (like the fence, grass, etc.), which generally suffer from imprecise boundary detail problem. In this paper, LEMNet is proposed to handle the imprecise boundary problem by guiding the generation of text boundary based on a priori constraint. In the training stage, Boundary Segmentation Branch is firstly constructed to predict coarse boundary mask for each text instance. Then, through mapping pixels into an embedding space, the proposed Pixel Embedding Branch makes the embedding representation of boundary points learn to be more similar, meanwhile enlarging the characteristic distance between background points and boundary points. During inference, noise in the coarse boundary segmentation map can be effectively suppressed by a Noisy Point Suppression Algorithm among pixel embedding vectors. In this way, LEMNet can generate a more precise boundary description of text regions. To further enhance the distinguishability of boundary features, we propose a Context Enhancement Module to capture feature interactions in different representation subspaces, in which features are parallelly performed attention and concatenated to generate enhanced features. Extensive experiments are conducted over four challenging datasets, which demonstrate the effectiveness of LEMNet. Specifically, LEMNet achieves F-measure of 85.2%, 87.6% and 85.2% on CTW1500, Total-Text and MSRA-TD500 respectively, which is the latest SOTA.Keywords
Funding Information
- National Natural Science Foundation of China (62022076, U1936210, 61972105)
- Fundamental Research Funds for the Central Universities (WK3480000011)
This publication has 53 references indexed in Scilit:
- Detecting Oriented Text in Natural Images by Linking SegmentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- EAST: An Efficient and Accurate Scene Text DetectorPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- Semantic Instance Segmentation for Autonomous DrivingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- Detecting Text in Natural Image with Connectionist Text Proposal NetworkPublished by Springer Science and Business Media LLC ,2016
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence, 2016
- Deep Residual Learning for Image RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Synthetic Data for Text Localisation in Natural ImagesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- ICDAR 2015 competition on Robust ReadingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Fully convolutional networks for semantic segmentationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- A Unified Framework for Multioriented Text Detection and RecognitionIEEE Transactions on Image Processing, 2014