Word Length Distribution in Mongolian

Abstract
This paper addresses the distribution features of word length and stem length in Mongolian, employing both dynamic (a corpus of 1 million Mongolian word tokens) and static (an orthographic Mongolian dictionary and a Mongolian stem dictionary) language resources. The results show that the Mongolian words and stems abide by the Poisson distribution. Concretely, the word length from the dynamic corpus abide by the Dacey-Poisson distribution, and all the others abide by the Conway-Maxwell-Poisson distribution. In addition, the Mongolian word lengths are influenced by word frequencies, basically abiding by Zipf’s Principle of Least Effort. The fitting experiments of power functions relationship between Mongolian word lengths and word frequencies using individual short texts, continuous long texts, and fixed-length texts indicate that the individual texts with fixed length (about 2000 words) yield the best fitting results.

This publication has 9 references indexed in Scilit: