Integration of a Segmentation Tool for Arabic Corpora in NooJ Platform to Build an Automatic Annotation Tool

Abstract
Automatic annotation for Arabic corpora has an important role in many applications of Natural Language Processing (NLP). In this context, we are interested in the automatic annotation of Arabic corpora using transducers set implemented in NooJ platform. And to achieve our aim, we must precede the annotation phase by a segmentation phase. This segmentation phase will, on the one hand, reduce the complexity of the analysis and, on the other hand, improve NooJ platform functionalities. Also, we achieved our annotation phase by identifying different types of lexical ambiguities, and then an appropriate set of rules is proposed. In addition, we experiment our phase on a test corpus with NooJ platform. The obtained results are ambitious and can be improved by adding other rules and heuristics.

This publication has 4 references indexed in Scilit: