Query Expansion for Transliterated Text Retrieval
- 20 July 2021
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Asian and Low-Resource Language Information Processing
- Vol. 20 (4), 1-34
- https://doi.org/10.1145/3447649
Abstract
With Web 2.0, there has been exponential growth in the number of Web users and the volume of Web content. Most of these users are not only consumers of the information but also generators of it. People express themselves here in colloquial languages, but using Roman script (transliteration). These texts are mostly informal and casual, and therefore seldom follow grammar rules. Also, there does not exist any prescribed set of spelling rules in transliterated text. This freedom leads to large-scale spelling variations, which is a major challenge in mixed script information processing. This article studies different existing phonetic algorithms to handle the issue of spelling variation, points out the limitations of them, and proposes a novel phonetic encoding approach with two different flavors in the light of Hindi transliteration. Experiments performed over Hindi song lyrics retrieval in mixed script domain with three different retrieval models show that proposed approaches outperform the existing techniques in a majority of the cases (sometimes statistically significantly) for a number of metrics like [email protected], [email protected], [email protected], MAP, MRR, and Recall.Keywords
This publication has 18 references indexed in Scilit:
- Overview of the FIRE 2013 Track on Transliterated SearchPublished by Association for Computing Machinery (ACM) ,2013
- Frontiers, challenges, and opportunities for information retrievalACM SIGIR Forum, 2012
- Machine transliteration surveyACM Computing Surveys, 2011
- Mining Synonymous Transliterations from the World Wide WebACM Transactions on Asian Language Information Processing, 2010
- Probabilistic models of information retrieval based on measuring the divergence from randomnessACM Transactions on Information Systems, 2002
- A probabilistic model of information retrieval: development and comparative experiments: Part 2Information Processing & Management, 2000
- Machine translation vs. dictionary term translationPublished by Association for Computational Linguistics (ACL) ,1998
- PHONIX: The algorithmProgram: electronic library and information systems, 1990
- Term-weighting approaches in automatic text retrievalInformation Processing & Management, 1988
- ‘Fisching fore werds’: phonetic retrieval of written text in information systemsProgram: electronic library and information systems, 1988