TY - GEN
T1 - HERMES
T2 - 21st International Conference on Intelligent Computing, ICIC 2025
AU - Ma, Yuxuan
AU - Xue, Jun
AU - Sang, Jinqiu
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Auditory Attention Decoding aims to identify the attended speech from EEG recordings, often formulated as a match-mismatch classification task. However, current methods suffer from a severe representation imbalance: while speech features are extracted using powerful pre-trained models, EEG encoders remain shallow and under-optimized, limiting overall performance. To address this gap, we propose HERMES—a Heterogeneous Mixture of Experts Based on Segments for EEG encoding. HERMES models EEG signals from three complementary perspectives: local temporal patterns, long-range dependencies, and global attention. Unlike traditional frame-level processing, HERMES operates at the segment level to preserve temporal context and semantic coherence. We further align EEG and speech representations in a shared space via contrastive similarity learning. Experiments on the large-scale SparrKULee dataset demonstrate that HERMES achieves 87.19% accuracy, surpassing the previous state-of-the-art models by over 5%, and exhibiting strong generalization across subjects and stories. Ablation studies further confirm the effectiveness of both the heterogeneous expert design and segment-level routing, each contributing significantly to performance gains. The implementation code will be available on Github: https://github.com/Collin8829/HERMES.git.
AB - Auditory Attention Decoding aims to identify the attended speech from EEG recordings, often formulated as a match-mismatch classification task. However, current methods suffer from a severe representation imbalance: while speech features are extracted using powerful pre-trained models, EEG encoders remain shallow and under-optimized, limiting overall performance. To address this gap, we propose HERMES—a Heterogeneous Mixture of Experts Based on Segments for EEG encoding. HERMES models EEG signals from three complementary perspectives: local temporal patterns, long-range dependencies, and global attention. Unlike traditional frame-level processing, HERMES operates at the segment level to preserve temporal context and semantic coherence. We further align EEG and speech representations in a shared space via contrastive similarity learning. Experiments on the large-scale SparrKULee dataset demonstrate that HERMES achieves 87.19% accuracy, surpassing the previous state-of-the-art models by over 5%, and exhibiting strong generalization across subjects and stories. Ablation studies further confirm the effectiveness of both the heterogeneous expert design and segment-level routing, each contributing significantly to performance gains. The implementation code will be available on Github: https://github.com/Collin8829/HERMES.git.
KW - Auditory Attention Decoding
KW - Contrastive learning
KW - Electroencephalography
KW - Mixture of Experts
KW - Segment-level EEG modeling
UR - https://www.scopus.com/pages/publications/105011820295
U2 - 10.1007/978-981-95-0027-7_16
DO - 10.1007/978-981-95-0027-7_16
M3 - 会议稿件
AN - SCOPUS:105011820295
SN - 9789819500260
T3 - Lecture Notes in Computer Science
SP - 177
EP - 188
BT - Advanced Intelligent Computing Technology and Applications - 21st International Conference, ICIC 2025, Proceedings
A2 - Huang, De-Shuang
A2 - Pan, Yijie
A2 - Chen, Wei
A2 - Li, Bo
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 26 July 2025 through 29 July 2025
ER -