Sell-corpus: An Open Source Multiple Accented Chinese-english Speech Corpus for L2 English Learning Assessment

Yu Chen, Jun Hu, Xinyu Zhang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

We present SELL-CORPUS, a multiple accented speech corpus for L2 English learning in China, aiming at the potential research of multiple accented acoustic model, mispronunciation detection and pronunciation assessment for future nationwide oral English tests. Our corpus contains 31.6 hour speech recordings contributed by 389 volunteer speakers, including 186 males and 203 females. Our corpus covers seven major regional dialects and provides a baseline for Chinese multiple accented automatic speech recognition system. We released our speech corpus to the public for academic research. To the best of our knowledge, it is the first open-source English speech corpus that accounts for the accents of all major Chinese regional dialects.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages7425-7429
Number of pages5
ISBN (Electronic)9781479981311
DOIs
StatePublished - May 2019
Event44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, United Kingdom
Duration: 12 May 201917 May 2019

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2019-May
ISSN (Print)1520-6149

Conference

Conference44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Country/TerritoryUnited Kingdom
CityBrighton
Period12/05/1917/05/19

Keywords

  • Automatic speech recognition
  • Chinese dialects
  • English pronunciation assessment
  • English speech corpus
  • Second language learning

Fingerprint

Dive into the research topics of 'Sell-corpus: An Open Source Multiple Accented Chinese-english Speech Corpus for L2 English Learning Assessment'. Together they form a unique fingerprint.

Cite this