A Study on Universal Background Model Training in Speaker Verification
- 14 February 2011
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Audio, Speech, and Language Processing
- Vol. 19 (7), 1890-1899
- https://doi.org/10.1109/tasl.2010.2102753
Abstract
State-of-the-art Gaussian mixture model (GMM)-based speaker recognition/verification systems utilize a universal background model (UBM), which typically requires extensive resources, especially if multiple channel and microphone categories are considered. In this study, a systematic analysis of speaker verification system performance is considered for which the UBM data is selected and purposefully altered in different ways, including variation in the amount of data, sub-sampling structure of the feature frames, and variation in the number of speakers. An objective measure is formulated from the UBM covariance matrix which is found to be highly correlated with system performance when the data amount was varied while keeping the UBM data set constant, and increasing the number of UBM speakers while keeping the data amount constant. The advantages of feature sub-sampling for improving UBM training speed is also discussed, and a novel and effective phonetic distance-based frame selection method is developed. The sub-sampling methods presented are shown to retain baseline equal error rate (EER) system performance using only 1% of the original UBM data, resulting in a drastic reduction in UBM training computation time. This, in theory, dispels the myth of “There's no data like more data” for the purpose of UBM construction. With respect to the UBM speakers, the effect of systematically controlling the number of training (UBM) speakers versus overall system performance is analyzed. It is shown experimentally that increasing the inter-speaker variability in the UBM data while maintaining the overall total data size constant gradually improves system performance. Finally, two alternative speaker selection methods based on different speaker diversity measures are presented. Using the proposed schemes, it is shown that by selecting a diverse set of UBM speakers, the baseline system performance can be retained using less than 30% of the original UBM speakers.Keywords
This publication has 15 references indexed in Scilit:
- Front-End Factor Analysis for Speaker VerificationIEEE Transactions on Audio, Speech, and Language Processing, 2010
- A novel feature sub-sampling method for efficient universal background model training in speaker verificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Factor analysis-based information integration for Arabic dialect identificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- A Study of Interspeaker Variability in Speaker VerificationIEEE Transactions on Audio, Speech, and Language Processing, 2008
- Advances In Channel Compensation For SVM Speaker RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- D-MAP: a distance-normalized MAP estimation of speaker models for automatic speaker verificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Rapid speaker adaptation in eigenvoice spaceIEEE Transactions on Speech and Audio Processing, 2000
- Speaker Verification Using Adapted Gaussian Mixture ModelsDigital Signal Processing, 2000
- Speaker identification and verification using Gaussian mixture speaker modelsSpeech Communication, 1995
- On Information and SufficiencyThe Annals of Mathematical Statistics, 1951