A hybrid approach to Arabic named entity recognition

16 October 2013

journal article
research article
Published by SAGE Publications in Journal of Information Science

Vol. 40 (1), 67-87
https://doi.org/10.1177/0165551513502417

Abstract

In this paper, we propose a hybrid named entity recognition (NER) approach that takes the advantages of rule-based and machine learning-based approaches in order to improve the overall system performance and overcome the knowledge elicitation bottleneck and the lack of resources for underdeveloped languages that require deep language processing, such as Arabic. The complexity of Arabic poses special challenges to researchers of Arabic NER, which is essential for both monolingual and multilingual applications. We used the hybrid approach to develop an Arabic NER system that is capable of recognizing 11 types of Arabic named entities: Person, Location, Organization, Date, Time, Price, Measurement, Percent, Phone Number, ISBN and File Name. Extensive experiments were conducted using decision trees, Support Vector Machines and logistic regression classifiers to evaluate the system performance. The empirical results indicate that the hybrid approach outperforms both the rule-based and the ML-based approaches when they are processed independently. More importantly, our system outperforms the state-of-the-art of Arabic NER in terms of accuracy when applied to ANERcorp standard dataset, with F-measures 0.94 for Person, 0.90 for Location and 0.88 for Organization.

Keywords

This publication has 27 references indexed in Scilit:

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH
Journal of Computer Science, 2013
Arabic Named Entity Recognition Using Artificial Neural Network
Journal of Computer Science, 2012
RENAR
ACM Transactions on Asian Language Information Processing, 2012
A hybrid named entity recognizer for Turkish
Expert Systems with Applications, 2012
Introduction to Arabic Natural Language Processing
Synthesis Lectures on Human Language Technologies, 2010
Arabic Natural Language Processing
ACM Transactions on Asian Language Information Processing, 2009
The WEKA data mining software
ACM SIGKDD Explorations Newsletter, 2009
NERA: Named Entity Recognition for Arabic
Journal of the American Society for Information Science and Technology, 2009
Arabic morphological analysis techniques: A comprehensive survey
Journal of the American Society for Information Science and Technology, 2003
Information extraction
Communications of the ACM, 1996

Cited by 47 articles