TY - GEN
T1 - A sequential contrastive learning framework for robust dysarthric speech recognition
AU - Wu, Lidan
AU - Zong, Daoming
AU - Sun, Shiliang
AU - Zhao, Jing
N1 - Publisher Copyright:
© 2021 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Dysarthria is a manifestation of disruption in the neuromuscular physiology resulting in uneven, slow, slurred, harsh, or quiet speech. Despite the remarkable progress of automatic speech recognition (ASR), it poses great challenges in developing stable ASR for dysarthric individuals due to the high intra- and inter-speaker variations and data deficiency. In this paper, we propose a contrastive learning framework for robust dysarthric speech recognition (DSR) by capturing the dysarthric speech variability. Several speech data augmentation strategies are explored to form two branches of the framework, meanwhile alleviating the scarcity of dysarthria data. We also develop an efficient projection head acting on a sequence of learned hidden representations for defining contrastive loss. Experiment results on DSR demonstrate that the model is better than or comparable to the supervised baseline.
AB - Dysarthria is a manifestation of disruption in the neuromuscular physiology resulting in uneven, slow, slurred, harsh, or quiet speech. Despite the remarkable progress of automatic speech recognition (ASR), it poses great challenges in developing stable ASR for dysarthric individuals due to the high intra- and inter-speaker variations and data deficiency. In this paper, we propose a contrastive learning framework for robust dysarthric speech recognition (DSR) by capturing the dysarthric speech variability. Several speech data augmentation strategies are explored to form two branches of the framework, meanwhile alleviating the scarcity of dysarthria data. We also develop an efficient projection head acting on a sequence of learned hidden representations for defining contrastive loss. Experiment results on DSR demonstrate that the model is better than or comparable to the supervised baseline.
KW - Contrastive learning
KW - Data augmentation
KW - Dysarthric speech recognition
KW - Self-supervised learning
UR - https://www.scopus.com/pages/publications/85115194846
U2 - 10.1109/ICASSP39728.2021.9415017
DO - 10.1109/ICASSP39728.2021.9415017
M3 - 会议稿件
AN - SCOPUS:85115194846
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 7303
EP - 7307
BT - 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
Y2 - 6 June 2021 through 11 June 2021
ER -