Blind reverberation time estimation using multi-channel speech recordings

  • Kaitong Zheng
  • , Chengshi Zheng
  • , Jinqiu Sang*
  • , Xiaodong Li
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Reverberation time is an objective index to characterize the reverberation and is also one of the most important parameters in room acoustics. Numerous methods for single-channel blind reverberation time estimation have been proposed and have achieved promising results. However, estimating reverberation time using single-channel speech only utilizes the spectro-temporal characteristics. As a matter of fact, the direct and early reflected sounds and the late reverberation have quite different spatial properties: the former is directional, while the latter is more diffused. Therefore, estimating reverberation time blindly using multichannel speech recorded with microphone arrays can utilize the information in the spatial domain, which can be expected to improve the performance of the reverberation time estimation. In this work, a multi-channel blind reverberation time estimation method was proposed. The complex spectra of multi-channel speech recordings were used as the input features of the neural network and the framework of neural network was designed to extract the spatial information. The estimation methods using single-channel, 2-channel and 3-channel speech recordings were evaluated in noisy environments at different SNRs. Experimental results showed that multi-channel estimation methods outperformed single-channel methods.

Original languageEnglish
JournalProceedings of the International Congress on Acoustics
StatePublished - 2022
Externally publishedYes
Event24th International Congress on Acoustics, ICA 2022 - Gyeongju, Korea, Republic of
Duration: 24 Oct 202228 Oct 2022

Keywords

  • Deep learning
  • Multiple channels
  • Reverberation time estimation

Fingerprint

Dive into the research topics of 'Blind reverberation time estimation using multi-channel speech recordings'. Together they form a unique fingerprint.

Cite this