TY - JOUR
T1 - A denoising-aided multi-task learning method for blind estimation of reverberation time
AU - Zhang, Yulong
AU - Sang, Jinqiu
AU - Zheng, Chengshi
AU - Li, Xiaodong
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/5/31
Y1 - 2024/5/31
N2 - The noise in reverberant speech severely limits the estimation accuracy of reverberation time T60 using current deep learning (DL) methods. To address this issue, this paper proposes a denoising-aided multi-task learning (DAMTL) method for blind T60 estimation. Specifically, speech denoising, as an auxiliary module, is conducted joint training with T60 estimation for more accurate prediction accuracy. These two tasks are integrated into one DL network by sharing the same encoder network, where the complex-valued spectrum is introduced to extract comprehensive high-dimensional features from noisy reverberant speech. Subsequently, complex operation of 2-D convolutional neural network (Conv2d), batch normalization and long short-term memory (LSTM) are formulated. Furthermore, the noise robustness and applicability of the DAMTL are fully discussed by comparison with state-of-the-art DL-based methods using simulated data and real-world recorded data. The results prove the effectiveness and superiority of the proposed DAMTL, especially in low signal-to-noise ratio (SNR) scenarios and practical applications.
AB - The noise in reverberant speech severely limits the estimation accuracy of reverberation time T60 using current deep learning (DL) methods. To address this issue, this paper proposes a denoising-aided multi-task learning (DAMTL) method for blind T60 estimation. Specifically, speech denoising, as an auxiliary module, is conducted joint training with T60 estimation for more accurate prediction accuracy. These two tasks are integrated into one DL network by sharing the same encoder network, where the complex-valued spectrum is introduced to extract comprehensive high-dimensional features from noisy reverberant speech. Subsequently, complex operation of 2-D convolutional neural network (Conv2d), batch normalization and long short-term memory (LSTM) are formulated. Furthermore, the noise robustness and applicability of the DAMTL are fully discussed by comparison with state-of-the-art DL-based methods using simulated data and real-world recorded data. The results prove the effectiveness and superiority of the proposed DAMTL, especially in low signal-to-noise ratio (SNR) scenarios and practical applications.
KW - Blind reverberation time estimation
KW - Low signal-to-noise-ratio scenario
KW - Multi-task learning
KW - Noisy environment
KW - Speech denoising
UR - https://www.scopus.com/pages/publications/85189289520
U2 - 10.1016/j.measurement.2024.114568
DO - 10.1016/j.measurement.2024.114568
M3 - 文章
AN - SCOPUS:85189289520
SN - 0263-2241
VL - 231
JO - Measurement: Journal of the International Measurement Confederation
JF - Measurement: Journal of the International Measurement Confederation
M1 - 114568
ER -