Integration of a Segmentation Tool for Arabic Corpora in NooJ Platform to Build an Automatic Annotation Tool
- 1 January 2016
- book chapter
- conference paper
- Published by Springer Science and Business Media LLC
Abstract
Automatic annotation for Arabic corpora has an important role in many applications of Natural Language Processing (NLP). In this context, we are interested in the automatic annotation of Arabic corpora using transducers set implemented in NooJ platform. And to achieve our aim, we must precede the annotation phase by a segmentation phase. This segmentation phase will, on the one hand, reduce the complexity of the analysis and, on the other hand, improve NooJ platform functionalities. Also, we achieved our annotation phase by identifying different types of lexical ambiguities, and then an appropriate set of rules is proposed. In addition, we experiment our phase on a test corpus with NooJ platform. The obtained results are ambitious and can be improved by adding other rules and heuristics.Keywords
This publication has 4 references indexed in Scilit:
- Study and Resolution of Arabic Lexical Ambiguity Through Transduction on Text AutomatonCommunications in Computer and Information Science, 2016
- Discourse Segmentation for Spanish Based on Shallow ParsingLecture Notes in Computer Science, 2010
- A syntactic and lexical-based discourse segmenterPublished by Association for Computational Linguistics (ACL) ,2009
- Semantic-Based Segmentation of Arabic TextsInformation Technology Journal, 2008