Detection of vowel onset point in speech

Abstract
Sound units in many languages are syllabic in nature, and frequently used syllables are of consonant-vowel (CV) type. Vowel onset point (VOP) is an important event in CV units. Knowledge of VOPs helps in many applications such as speech recognition, speaker recognition, speech enhancement, begin-end detection, segmentation of speech into vowel/nonvowel-like units and finding duration of vowels. In this paper we describe parameters or features useful for manually identifying the VOPs for different types of CV units. An automatic algorithm is proposed for detecting VOPs in continuous speech, which is motivated by the nature of production and perception of speech. Speech signal is a result of exciting a time varying vocal tract system with time varying excitation. Changes in the source and system characteristics around the VOP are both useful for the detection of VOPs. In this paper we use the changes in the source characteristics for detecting the VOPs. The performance of the proposed algorithm is evaluated using 25 sentences for which a total of 236 VOPs have been identified manually. It is found that 216 VOPs have been detected within a resolution of +/− 30 ms. Compared to the energy-based approach, VOP-based begin-end detection has significantly improved the performance in the case of a text-dependent speaker verification system. For a telephone database of 32 speakers consisting of 480 genuine