Multi-task transformer with input feature reconstruction for dysarthric speech recognition

Chaoyue Ding, Shiliang Sun, Jing Zhao*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Dysarthria is a motor speech disorder caused by damage to the part of the nervous system that controls the physical production of speech. It poses great challenges in building robust dysarthric speech recognition (DSR) due to the high inter- and intra-speaker variability. To this end, we propose a multi-task Transformer with input feature reconstruction as an auxiliary task, where the main task of DSR and the auxiliary reconstruction task share the same encoder network. The auxiliary task aims to reconstruct clear speech features from corrupted speech of healthy speakers (intra-domain) or dysarthric speakers (cross-domain). Further, to alleviate the imbalanced distribution of dysarthria data sets, we devise an adaptive rebalance sampling scheme to improve the utterance sampling frequency of dysarthric speech. Experimental results show that the proposed model considerably outperforms other baselines across speakers with varying severity of dysarthria.

Original languageEnglish
Title of host publication2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages7318-7322
Number of pages5
ISBN (Electronic)9781728176055
DOIs
StatePublished - 2021
Event2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada
Duration: 6 Jun 202111 Jun 2021

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2021-June
ISSN (Print)1520-6149

Conference

Conference2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
Country/TerritoryCanada
CityVirtual, Toronto
Period6/06/2111/06/21

Keywords

  • Dysarthric speech recognition
  • Multi-task
  • Reconstruction

Fingerprint

Dive into the research topics of 'Multi-task transformer with input feature reconstruction for dysarthric speech recognition'. Together they form a unique fingerprint.

Cite this