TY - GEN
T1 - Comprehensive voice conversion analysis based on DGMM and feature combination
AU - Pan, He
AU - Wei, Yangjie
AU - Guan, Nan
AU - Wang, Yi
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/4/2
Y1 - 2014/4/2
N2 - Voice conversion system modifies a speaker's voice to be perceived as another speaker uttered, and now it is widely used in many real applications. However, most research only focuses on one aspect performance of voice conversion system, rare theoretical analysis and experimental comparison on the whole source-target speaker voice conversion process has been introduced. Therefore, in this paper, a comprehensive analysis on source-target speaker voice conversion is conducted based on three key steps, including acoustic features selection and extraction, voice conversion model construction, and target speech synthesis, and a complete and optimal source-target speaker voice conversion is proposed. First, a comprehensive feature combination form consisting of prosodic feature, spectrum parameter and spectral envelope characteristic, is proposed. Then, to void the discontinuity and spectrum distortion of a converted speech, DGMM (Dynamic Gaussian Mixture Model) considering dynamic information between frames is presented. Subsequently, for speech synthesis, STRAIGHT algorithm synthesizer with feature combination is modified. Finally, the objective contrast experiment shows that our new source-target voice conversion process achieves better performance than the conventional methods. In addition, the speaker recognition system is also used to evaluate the quality of converted speech, and experimental result shows that the converted speech has higher target speaker individuality and speech quality.
AB - Voice conversion system modifies a speaker's voice to be perceived as another speaker uttered, and now it is widely used in many real applications. However, most research only focuses on one aspect performance of voice conversion system, rare theoretical analysis and experimental comparison on the whole source-target speaker voice conversion process has been introduced. Therefore, in this paper, a comprehensive analysis on source-target speaker voice conversion is conducted based on three key steps, including acoustic features selection and extraction, voice conversion model construction, and target speech synthesis, and a complete and optimal source-target speaker voice conversion is proposed. First, a comprehensive feature combination form consisting of prosodic feature, spectrum parameter and spectral envelope characteristic, is proposed. Then, to void the discontinuity and spectrum distortion of a converted speech, DGMM (Dynamic Gaussian Mixture Model) considering dynamic information between frames is presented. Subsequently, for speech synthesis, STRAIGHT algorithm synthesizer with feature combination is modified. Finally, the objective contrast experiment shows that our new source-target voice conversion process achieves better performance than the conventional methods. In addition, the speaker recognition system is also used to evaluate the quality of converted speech, and experimental result shows that the converted speech has higher target speaker individuality and speech quality.
KW - DGMM
KW - STRAIGHT synthesis
KW - feature combination
KW - speaker recognition
KW - voice conversion
UR - https://www.scopus.com/pages/publications/84983200521
U2 - 10.1109/AMS.2014.39
DO - 10.1109/AMS.2014.39
M3 - 会议稿件
AN - SCOPUS:84983200521
T3 - Proceedings - Asia Modelling Symposium 2014: 8th Asia International Conference on Mathematical Modelling and Computer Simulation, AMS 2014
SP - 159
EP - 164
BT - Proceedings - Asia Modelling Symposium 2014
A2 - Ma, Shang-Pin
A2 - Kuo, Jong
A2 - Liu, Chien-Hung
A2 - Ibrahim, Zuwairie
A2 - Al-Dabass, David
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 8th Asia International Conference on Mathematical Modelling and Computer Simulation - Asia Modelling Symposium, AMS 2014
Y2 - 23 September 2014 through 25 September 2014
ER -