Creating an Oral Educational Corpus of the Russian Language for Non-Native Speakers: the Initial Results and Prospect

Abstract
The article presents initial results of the project on design of the oral educational corpus containing a transcribed and annotated collection of spontaneous/unprepared speech recordings of students learning Russian as a foreign language. This article includes a literature review on design of oral corpora; a discussion on how to choose stimuli for production of spontaneous oral speech by non-native speakers; a description of the transcription experience, the classification and the summary of non-standard phonetic and communicative phenomena; the quality examination of a number of deviances which properties are impossible to investigate in the written speech. The article explores the following prosodic features typical for the oral speech of a foreign student: pauses, hesitations (voiced pauses), physiological pauses, phonetic inaccuracies, and self-corrections. The quantitative examination of the four trial speech pieces revealed the fact that the recordings do not demonstrate any differences in terms of the number of physiological pauses. However, there are significant fluctuations in the number of phonetic inaccuracies, voiced pauses and self-corrections. Comparing the above mentioned observations we identified several profiles, which reflect communicative performance of the speaker: a) the profile with a significant number of pauses indicating planning of the statement, but with a few corrections and phonetic inaccuracies; in this case the foreigner’s speech is slow but grammatically and phonetically more accurate and cohesive; b) the profile with a significant number of pauses, phonetic inaccuracies and self-corrections: the speaker has difficulties with statement planning and pronunciation; с) the profile with a few pauses and hesitations, but with a significant number of phonetic inaccuracies: the speech is quite fast, while the pronunciation is rather poor.