Boundary Recognition of Light-Pause Marks via Grammar Testing Method

Abstract
Boundary recognition is an important research of natural language processing, and it provides a basis for the application of Chinese word segmentation, chunk analysis, named entity recognition, etc. Based on ambiguity in boundary recognition of Chinese punctuation marks, this paper proposes grammar testing methods for boundary recognition of slight-pause marks and then calculates the annotation consistency of these methods. The statistical results show that grammar testing methods can greatly improve the annotation consistency of slight-pause marks boundary recognition. The consistency during the second time is 0.030 3 higher than during the first, which will help guarantee the consistency of large-scale corpus annotation and improve the quality of corpus annotation.

This publication has 7 references indexed in Scilit: