VB-Adapter: Variational Bayesian Adapter for Cross-Domain Speech Representation Learning

  • Jing Zhao
  • , Qimin Huang
  • , Shanhu Wang
  • , Shiliang Sun*
  • *此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

To leverage the abundant speech data available for pretraining, current models excel in generalization across diverse tasks. Nevertheless, real-world challenges emerge when addressing unfamiliar speech scenarios far from the pretrained speech, owing to the domain shift between pretraining (source) and fine-tuning (target) data. To overcome this barrier, we propose a variational Bayesian adapter (VB-Adapter) for cross-domain speech representation learning during fine-tuning. First, we establish a latent variable model to construct a desired posterior distribution after incorporating domain-specific knowledge to bridge the gap between the source and target domains. Then, an adaptive objective is presented to maximize the mutual information of the latent variables with and without domain-specific knowledge to facilitate model adaptation. Finally, we introduce contrastive learning on samples to optimize the lower bound of the above adaptive objective. Our experiments apply the VB-Adapter on transformers for dysarthric speech recognition (DSR) and the integration of Whisper-encoder and Llama for Mandarin speech recognition (MSR). The results reveal the effectiveness of VB-Adapter in modeling the uncertainties arising from domain shift and enhancing the robustness of speech representations.

源语言英语
页(从-至)18300-18311
页数12
期刊IEEE Transactions on Neural Networks and Learning Systems
36
10
DOI
出版状态已出版 - 2025

指纹

探究 'VB-Adapter: Variational Bayesian Adapter for Cross-Domain Speech Representation Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此