跳到主要导航 跳到搜索 跳到主要内容

Towards Speaker-Unknown Emotion Recognition in Conversation via Progressive Contrastive Deep Supervision

  • Siyuan Shen
  • , Feng Liu*
  • , Hanyang Wang
  • , Aimin Zhou*
  • *此作品的通讯作者
  • Baidu Inc
  • Shanghai Jiao Tong University
  • Midea Group

科研成果: 期刊稿件文章同行评审

摘要

Emotion recognition in conversation has attained increasing attention for perceiving user emotion in practical conversational applications. Conversational utterances spoken alternately by different speakers inspire most studies to leverage speaker information based on golden speaker labels. In this work, we challenge the existing paradigm of utilizing available speaker labels with a more realistic scenario, where the speaker identity of each utterance is unknown during inference. We propose Progressive Contrastive Deep Supervision for multimodal emotion recognition in conversation (PCDS), incorporating speaker diarization and emotion recognition into one unified framework. To facilitate joint task learning, we inject speaker and emotion bias into the network progressively via contrastive deep supervision, with the task-irrelevant contrast being the intermediate transition. To obtain explicit speaker dependency, we propose a speaker contrast and clustering module (SCC) to endow the capability of partitioning speakers into groups even when neither speaker label nor number of speakers is known as a priori. Experiments on two ERC benchmarks, including IEMOCAP and MELD demonstrate the effectiveness of the proposed method. We also show that progressive contrastive deep supervision helps reconcile the underlying tension between speaker diarization and emotion recognition.

源语言英语
页(从-至)2261-2273
页数13
期刊IEEE Transactions on Affective Computing
16
3
DOI
出版状态已出版 - 2025

指纹

探究 'Towards Speaker-Unknown Emotion Recognition in Conversation via Progressive Contrastive Deep Supervision' 的科研主题。它们共同构成独一无二的指纹。

引用此