CVTE-Poly: A New Benchmark for Chinese Polyphone Disambiguation

  • Siheng Zhang
  • , Xingjun Tan
  • , Yanqiang Lei
  • , Xianxiang Wang
  • , Zhizhong Zhang*
  • , Yuan Xie
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Conversion from graphemes to phonemes is an essential component in Text-To-Speech systems, and in Chinese, one main challenge is polyphone disambiguation-to determine the pronunciation of characters with multiple pronunciations. In this task, the benchmark dataset Chinese Polyphone disambiguation with Pinyin (CPP) suffers from two main limitations: Firstly, it contains some wrong labels in contrast to the newest official dictionary. Secondly, it is imbalanced and hence models learned from it show a learning bias towards frequently-used pronunciations and polyphones. In this paper, we refine CPP and release a new dataset named CVTE-poly, containing 845254 samples, nearly ten times the size of CPP and is more balanced. Besides, we propose a comprehensive measurement for polyphone disambiguation task, against the data imbalance problem. Experiments show that our simple but flexible baseline trained on CVTE-poly outperforms existing models, which demonstrate the benefit of our dataset.

Original languageEnglish
Pages (from-to)5526-5530
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
StatePublished - 2023
Event24th Annual conference of the International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: 20 Aug 202324 Aug 2023

Keywords

  • Chinese graphemes to phonemes
  • deep learning
  • polyphone disambiguation
  • text-to-speech

Fingerprint

Dive into the research topics of 'CVTE-Poly: A New Benchmark for Chinese Polyphone Disambiguation'. Together they form a unique fingerprint.

Cite this