Speaker Extraction with Detection of Presence and Absence of Target Speakers

  • Ke Zhang
  • , Marvin Borsdorf
  • , Zexu Pan
  • , Haizhou Li
  • , Yangjie Wei
  • , Yi Wang

Research output: Contribution to journalConference articlepeer-review

5 Scopus citations

Abstract

Target speaker extraction extracts a target voice from a given cocktail party mixture signal. Most studies are restricted to conditions in which the target speaker is present in the mixture (PT), which often fail when the target speaker is absent (AT). Training on both PT and AT situations helps, but degrades the PT performance as the model intrinsically tries to detect the target presence. We propose a new model, called TSEJoint, that jointly performs target speaker detection and extraction. Both tasks share the low-level modules, allowing the detection branch to use a pre-separated signal and keeping the overall processing pipeline length similar, while at the high-level they have different branches to ensure the performance of each task. We evaluate our proposed methods under PT and AT conditions comprising one and two talkers. The TSEJoint model shows better extraction performance under the PT condition and better detection performance on all conditions compared with the baseline.

Original languageEnglish
Pages (from-to)3714-3718
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
StatePublished - 2023
Externally publishedYes
Event24th Annual conference of the International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: 20 Aug 202324 Aug 2023

Keywords

  • absent speaker
  • cocktail party problem
  • selective auditory attention
  • speaker detection
  • target speaker extraction

Fingerprint

Dive into the research topics of 'Speaker Extraction with Detection of Presence and Absence of Target Speakers'. Together they form a unique fingerprint.

Cite this