HERMES: Heterogeneous Mixture of Experts Based on Segments for Auditory Attention Decoding

Yuxuan Ma, Jun Xue*, Jinqiu Sang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Auditory Attention Decoding aims to identify the attended speech from EEG recordings, often formulated as a match-mismatch classification task. However, current methods suffer from a severe representation imbalance: while speech features are extracted using powerful pre-trained models, EEG encoders remain shallow and under-optimized, limiting overall performance. To address this gap, we propose HERMES—a Heterogeneous Mixture of Experts Based on Segments for EEG encoding. HERMES models EEG signals from three complementary perspectives: local temporal patterns, long-range dependencies, and global attention. Unlike traditional frame-level processing, HERMES operates at the segment level to preserve temporal context and semantic coherence. We further align EEG and speech representations in a shared space via contrastive similarity learning. Experiments on the large-scale SparrKULee dataset demonstrate that HERMES achieves 87.19% accuracy, surpassing the previous state-of-the-art models by over 5%, and exhibiting strong generalization across subjects and stories. Ablation studies further confirm the effectiveness of both the heterogeneous expert design and segment-level routing, each contributing significantly to performance gains. The implementation code will be available on Github: https://github.com/Collin8829/HERMES.git.

Original languageEnglish
Title of host publicationAdvanced Intelligent Computing Technology and Applications - 21st International Conference, ICIC 2025, Proceedings
EditorsDe-Shuang Huang, Yijie Pan, Wei Chen, Bo Li
PublisherSpringer Science and Business Media Deutschland GmbH
Pages177-188
Number of pages12
ISBN (Print)9789819500260
DOIs
StatePublished - 2025
Event21st International Conference on Intelligent Computing, ICIC 2025 - Ningbo, China
Duration: 26 Jul 202529 Jul 2025

Publication series

NameLecture Notes in Computer Science
Volume15866 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21st International Conference on Intelligent Computing, ICIC 2025
Country/TerritoryChina
CityNingbo
Period26/07/2529/07/25

Keywords

  • Auditory Attention Decoding
  • Contrastive learning
  • Electroencephalography
  • Mixture of Experts
  • Segment-level EEG modeling

Fingerprint

Dive into the research topics of 'HERMES: Heterogeneous Mixture of Experts Based on Segments for Auditory Attention Decoding'. Together they form a unique fingerprint.

Cite this