A Comparative Survey on Arabic Stemming: Approaches and Challenges

Open Access

1 January 2017

journal article
research article
Published by Scientific Research Publishing, Inc. in Intelligent Information Management

Vol. 09 (02), 39-67
https://doi.org/10.4236/iim.2017.92003

Abstract

Arabic, as one of the Semitic languages, has a very rich and complex morphology, which is radically different from the European and the East Asian languages. The derivational system of Arabic, is therefore, based on roots, which are often inflected to compose words, using a spectacular and a relatively large set of Arabic morphemes affixes, e.g., antefixs, prefixes, suffixes, etc. Stemming is the process of rendering all the inflected forms of word into a common canonical form. Stemming is one of the early and major phases in natural processing, machine translation and information retrieval tasks. A number of Arabic language stemmers were proposed. Examples include light stemming, morphological analysis, statistical-based stemming, N-grams and parallel corpora (collections). Motivated by the reported results in the literature, this paper attempts to exhaustively review current achievements for stemming Arabic texts. A variety of algorithms are discussed. The main contribution of the paper is to provide better understanding among existing approaches with the hope of building an error-free and effective Arabic stemmer in the near future.

Keywords

This publication has 26 references indexed in Scilit:

Arabic Information Retrieval
Foundations and Trends® in Information Retrieval, 2013
Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents
Information Retrieval Journal, 2008
An auto-indexing method for Arabic text
Information Processing & Management, 2008
Web searching in a multilingual world
Communications of the ACM, 2008
Light Stemming for Arabic Information Retrieval
Published by Springer Science and Business Media LLC ,2007
Dictionary-based techniques for cross-language information retrieval
Information Processing & Management, 2005
Stemming Arabic Conjunctions and Prepositions
Lecture Notes in Computer Science, 2005
Using N-grams for Arabic text searching
Journal of the American Society for Information Science and Technology, 2004
A Systematic Comparison of Various Statistical Alignment Models
Computational Linguistics, 2003
Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems, 1998

Cited by 18 articles