TY - JOUR
T1 - CVTE-Poly
T2 - 24th Annual conference of the International Speech Communication Association, Interspeech 2023
AU - Zhang, Siheng
AU - Tan, Xingjun
AU - Lei, Yanqiang
AU - Wang, Xianxiang
AU - Zhang, Zhizhong
AU - Xie, Yuan
N1 - Publisher Copyright:
© 2023 International Speech Communication Association. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Conversion from graphemes to phonemes is an essential component in Text-To-Speech systems, and in Chinese, one main challenge is polyphone disambiguation-to determine the pronunciation of characters with multiple pronunciations. In this task, the benchmark dataset Chinese Polyphone disambiguation with Pinyin (CPP) suffers from two main limitations: Firstly, it contains some wrong labels in contrast to the newest official dictionary. Secondly, it is imbalanced and hence models learned from it show a learning bias towards frequently-used pronunciations and polyphones. In this paper, we refine CPP and release a new dataset named CVTE-poly, containing 845254 samples, nearly ten times the size of CPP and is more balanced. Besides, we propose a comprehensive measurement for polyphone disambiguation task, against the data imbalance problem. Experiments show that our simple but flexible baseline trained on CVTE-poly outperforms existing models, which demonstrate the benefit of our dataset.
AB - Conversion from graphemes to phonemes is an essential component in Text-To-Speech systems, and in Chinese, one main challenge is polyphone disambiguation-to determine the pronunciation of characters with multiple pronunciations. In this task, the benchmark dataset Chinese Polyphone disambiguation with Pinyin (CPP) suffers from two main limitations: Firstly, it contains some wrong labels in contrast to the newest official dictionary. Secondly, it is imbalanced and hence models learned from it show a learning bias towards frequently-used pronunciations and polyphones. In this paper, we refine CPP and release a new dataset named CVTE-poly, containing 845254 samples, nearly ten times the size of CPP and is more balanced. Besides, we propose a comprehensive measurement for polyphone disambiguation task, against the data imbalance problem. Experiments show that our simple but flexible baseline trained on CVTE-poly outperforms existing models, which demonstrate the benefit of our dataset.
KW - Chinese graphemes to phonemes
KW - deep learning
KW - polyphone disambiguation
KW - text-to-speech
UR - https://www.scopus.com/pages/publications/85171547871
U2 - 10.21437/Interspeech.2023-553
DO - 10.21437/Interspeech.2023-553
M3 - 会议文章
AN - SCOPUS:85171547871
SN - 2308-457X
VL - 2023-August
SP - 5526
EP - 5530
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 20 August 2023 through 24 August 2023
ER -