Source-side Reordering to Improve Machine Translation between Languages with Distinct Word Orders
- 31 July 2021
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Asian and Low-Resource Language Information Processing
- Vol. 20 (4), 1-18
- https://doi.org/10.1145/3448252
Abstract
English and Hindi have significantly different word orders. English follows the subject-verb-object (SVO) order, while Hindi primarily follows the subject-object-verb (SOV) order. This difference poses challenges to modeling this pair of languages for translation. In phrase-based translation systems, word reordering is governed by the language model, the phrase table, and reordering models. Reordering in such systems is generally achieved during decoding by transposing words within a defined window. These systems can handle local reorderings, and while some phrase-level reorderings are carried out during the formation of phrases, they are weak in learning long-distance reorderings. To overcome this weakness, researchers have used reordering as a step in pre-processing to render the reordered source sentence closer to the target language in terms of word order. Such approaches focus on using parts-of-speech (POS) tag sequences and reordering the syntax tree by using grammatical rules, or through head finalization. This study shows that mere head finalization is not sufficient for the reordering of sentences in the English-Hindi language pair. It describes various grammatical constructs and presents a comparative evaluation of reorderings with the original and the head-finalized representations. The impact of the reordering on the quality of translation is measured through the BLEU score in phrase-based statistical systems and neural machine translation systems. A significant gain in BLEU score was noted for reorderings in different grammatical constructs.Keywords
This publication has 12 references indexed in Scilit:
- Recursive Neural Network Based Preordering for English-to-Japanese Machine TranslationPublished by Association for Computational Linguistics (ACL) ,2018
- Modeling Source Syntax for Neural Machine TranslationPublished by Association for Computational Linguistics (ACL) ,2017
- OpenNMT: Open-Source Toolkit for Neural Machine TranslationPublished by Association for Computational Linguistics (ACL) ,2017
- A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language PhenomenaComputational Linguistics, 2016
- Neural versus Phrase-Based Machine Translation Quality: a Case StudyPublished by Association for Computational Linguistics (ACL) ,2016
- The Alignment Template Approach to Statistical Machine TranslationComputational Linguistics, 2004
- A Unigram Orientation Model for Statistical Machine TranslationPublished by Defense Technical Information Center (DTIC) ,2004
- A Systematic Comparison of Various Statistical Alignment ModelsComputational Linguistics, 2003
- Minimum error rate training in statistical machine translationPublished by Association for Computational Linguistics (ACL) ,2003
- A NEW MEASURE OF RANK CORRELATIONBiometrika, 1938