VB-Adapter: Variational Bayesian Adapter for Cross-Domain Speech Representation Learning

Jing Zhao, Qimin Huang, Shanhu Wang, Shiliang Sun*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

To leverage the abundant speech data available for pretraining, current models excel in generalization across diverse tasks. Nevertheless, real-world challenges emerge when addressing unfamiliar speech scenarios far from the pretrained speech, owing to the domain shift between pretraining (source) and fine-tuning (target) data. To overcome this barrier, we propose a variational Bayesian adapter (VB-Adapter) for cross-domain speech representation learning during fine-tuning. First, we establish a latent variable model to construct a desired posterior distribution after incorporating domain-specific knowledge to bridge the gap between the source and target domains. Then, an adaptive objective is presented to maximize the mutual information of the latent variables with and without domain-specific knowledge to facilitate model adaptation. Finally, we introduce contrastive learning on samples to optimize the lower bound of the above adaptive objective. Our experiments apply the VB-Adapter on transformers for dysarthric speech recognition (DSR) and the integration of Whisper-encoder and Llama for Mandarin speech recognition (MSR). The results reveal the effectiveness of VB-Adapter in modeling the uncertainties arising from domain shift and enhancing the robustness of speech representations.

Original languageEnglish
Pages (from-to)18300-18311
Number of pages12
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume36
Issue number10
DOIs
StatePublished - 2025

Keywords

  • Domain adaptation
  • speech recognition
  • variational Bayesian transformer

Fingerprint

Dive into the research topics of 'VB-Adapter: Variational Bayesian Adapter for Cross-Domain Speech Representation Learning'. Together they form a unique fingerprint.

Cite this