Source-side Reordering to Improve Machine Translation between Languages with Distinct Word Orders

31 July 2021

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Asian and Low-Resource Language Information Processing

Vol. 20 (4), 1-18
https://doi.org/10.1145/3448252

Abstract

English and Hindi have significantly different word orders. English follows the subject-verb-object (SVO) order, while Hindi primarily follows the subject-object-verb (SOV) order. This difference poses challenges to modeling this pair of languages for translation. In phrase-based translation systems, word reordering is governed by the language model, the phrase table, and reordering models. Reordering in such systems is generally achieved during decoding by transposing words within a defined window. These systems can handle local reorderings, and while some phrase-level reorderings are carried out during the formation of phrases, they are weak in learning long-distance reorderings. To overcome this weakness, researchers have used reordering as a step in pre-processing to render the reordered source sentence closer to the target language in terms of word order. Such approaches focus on using parts-of-speech (POS) tag sequences and reordering the syntax tree by using grammatical rules, or through head finalization. This study shows that mere head finalization is not sufficient for the reordering of sentences in the English-Hindi language pair. It describes various grammatical constructs and presents a comparative evaluation of reorderings with the original and the head-finalized representations. The impact of the reordering on the quality of translation is measured through the BLEU score in phrase-based statistical systems and neural machine translation systems. A significant gain in BLEU score was noted for reorderings in different grammatical constructs.

Keywords

This publication has 12 references indexed in Scilit:

Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation
Published by Association for Computational Linguistics (ACL) ,2018
Modeling Source Syntax for Neural Machine Translation
Published by Association for Computational Linguistics (ACL) ,2017
OpenNMT: Open-Source Toolkit for Neural Machine Translation
Published by Association for Computational Linguistics (ACL) ,2017
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Computational Linguistics, 2016
Neural versus Phrase-Based Machine Translation Quality: a Case Study
Published by Association for Computational Linguistics (ACL) ,2016
The Alignment Template Approach to Statistical Machine Translation
Computational Linguistics, 2004
A Unigram Orientation Model for Statistical Machine Translation
Published by Defense Technical Information Center (DTIC) ,2004
A Systematic Comparison of Various Statistical Alignment Models
Computational Linguistics, 2003
Minimum error rate training in statistical machine translation
Published by Association for Computational Linguistics (ACL) ,2003
A NEW MEASURE OF RANK CORRELATION
Biometrika, 1938