Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

Top Cited Papers

Abstract

In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMM-based speech synthesis. We then propose a new adaptation algorithm called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms. Here, we investigate six major aspects of the speaker adaptation: initial models; the amount of the training data for the initial models; the transform functions, estimation criteria, and sensitivity of several linear regression adaptation algorithms; and combination algorithms. Analyzing the effect of the initial model, we compare speaker-dependent models, gender-independent models, and the simultaneous use of the gender-dependent models to single use of the gender-dependent models. Analyzing the effect of the transform functions, we compare the transform function for only mean vectors with that for mean vectors and covariance matrices. Analyzing the effect of the estimation criteria, we compare the ML criterion with a robust estimation criterion called structural MAP. We evaluate the sensitivity of several thresholds for the piecewise linear regression algorithms and take up methods combining MAP adaptation with the linear regression algorithms. We incorporate these adaptation algorithms into our speech synthesis system and present several subjective and objective evaluation results showing the utility and effectiveness of these algorithms in speaker adaptation for HMM-based speech synthesis.

Keywords

This publication has 41 references indexed in Scilit:

New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer
Speech Communication, 2006
HMM-Based Emotional Speech Synthesis Using Average Emotion Model
Lecture Notes in Computer Science, 2006
A compact model for speaker-adaptive training
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
A structural Bayes approach to speaker adaptation
IEEE Transactions on Speech and Audio Processing, 2001
Maximum likelihood linear transformations for HMM-based speech recognition
Computer Speech & Language, 1998
Speaker adaptation using constrained estimation of Gaussian mixtures
IEEE Transactions on Speech and Audio Processing, 1995
Elliptically Contoured Models in Statistics
Published by Springer Science and Business Media LLC ,1993
An adaptive algorithm for mel-cepstral analysis of speech
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1992
Continuously variable duration hidden Markov models for automatic speech recognition
Computer Speech & Language, 1986
Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1985

Cited by 193 articles