TY - JOUR
T1 - Hybrid Expert Knowledge and Self-Supervised Learning for Diagnostic Modeling of Adductor Spasmodic and Primary Myotonic Dysphonia
AU - Du, Zhou
AU - Chen, Hang
AU - Ding, Huijun
AU - Du, Jun
AU - Chen, Zhen
N1 - Publisher Copyright:
© 2025 International Speech Communication Association. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Dysphonia encompasses a broad spectrum of vocal disorders with diverse etiologies, among which adductor spasmodic dysphonia (ADSD) and primary muscle tension dysphonia (pMTD) are particularly challenging to diagnose. Currently, the primary diagnostic method relies on subjective auditory perception by highly experienced clinicians. To alleviate the scarcity of diagnostic resources, this study develops a deep learning-based approach for automatically diagnosing ADSD and pMTD using patients' speech data. Our contributions are: (1) designing a convolutional neural network (CNN)-based diagnostic model that leverages handcrafted features derived from expert knowledge and (2) incorporating self-supervised learning (SSL) to extract more discriminative representations as input from raw waveforms adaptively. This marks the first application of deep learning techniques to ADSD and pMTD diagnostic modeling, achieving a classification accuracy of 83.3% on our newly constructed dataset.
AB - Dysphonia encompasses a broad spectrum of vocal disorders with diverse etiologies, among which adductor spasmodic dysphonia (ADSD) and primary muscle tension dysphonia (pMTD) are particularly challenging to diagnose. Currently, the primary diagnostic method relies on subjective auditory perception by highly experienced clinicians. To alleviate the scarcity of diagnostic resources, this study develops a deep learning-based approach for automatically diagnosing ADSD and pMTD using patients' speech data. Our contributions are: (1) designing a convolutional neural network (CNN)-based diagnostic model that leverages handcrafted features derived from expert knowledge and (2) incorporating self-supervised learning (SSL) to extract more discriminative representations as input from raw waveforms adaptively. This marks the first application of deep learning techniques to ADSD and pMTD diagnostic modeling, achieving a classification accuracy of 83.3% on our newly constructed dataset.
KW - diagnostic speech processing
KW - self-supervised learning
KW - speech disorder classification
UR - https://www.scopus.com/pages/publications/105020077987
U2 - 10.21437/Interspeech.2025-1406
DO - 10.21437/Interspeech.2025-1406
M3 - 会议文章
AN - SCOPUS:105020077987
SN - 2308-457X
SP - 3543
EP - 3547
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 26th Interspeech Conference 2025
Y2 - 17 August 2025 through 21 August 2025
ER -