TY - GEN
T1 - Channel Robust Strategies with Data Augmentation for Audio Anti-spoofing
AU - Mamarasulov, Sardor
AU - Li, Yang
AU - Wang, Changbo
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Robustness against channel variability remains a formidable challenge in audio anti-spoofing for speaker verification systems. Channel effects can significantly degrade the performance of countermeasure systems, making them susceptible to spoofing attacks. To address this challenge, we present a comprehensive approach that integrates channel-robust preprocessing with advanced graph-based neural networks to enhance detection reliability. Raw audio waveforms are preprocessed with data augmentation to simulate diverse acoustic conditions, and encoded using a modified RawNet2-based encoder to extract critical features. An adaptive graph module processes these features into spectral and temporal graphs. Our proposed method dynamically combines these graphs using a Heterogeneous Stacking Graph Attention Layer (HS-GAL), facilitating deeper integration and processing of audio data. The Max Graph Operation (MGO) further refines feature selection, crucial for identifying spoofed content. Additionally, our model incorporates adversarial and multi-task learning strategies, significantly enhancing its generalization capabilities across various datasets. Experimental results demonstrate that our approach reduces the Equal Error Rate (EER) by over 20% and the minimum tandem detection cost function (min t-DCF) by 25% relative to the current state-of-the-art, substantiating its efficacy in improving the security of speaker verification systems against channel-induced vulnerabilities.
AB - Robustness against channel variability remains a formidable challenge in audio anti-spoofing for speaker verification systems. Channel effects can significantly degrade the performance of countermeasure systems, making them susceptible to spoofing attacks. To address this challenge, we present a comprehensive approach that integrates channel-robust preprocessing with advanced graph-based neural networks to enhance detection reliability. Raw audio waveforms are preprocessed with data augmentation to simulate diverse acoustic conditions, and encoded using a modified RawNet2-based encoder to extract critical features. An adaptive graph module processes these features into spectral and temporal graphs. Our proposed method dynamically combines these graphs using a Heterogeneous Stacking Graph Attention Layer (HS-GAL), facilitating deeper integration and processing of audio data. The Max Graph Operation (MGO) further refines feature selection, crucial for identifying spoofed content. Additionally, our model incorporates adversarial and multi-task learning strategies, significantly enhancing its generalization capabilities across various datasets. Experimental results demonstrate that our approach reduces the Equal Error Rate (EER) by over 20% and the minimum tandem detection cost function (min t-DCF) by 25% relative to the current state-of-the-art, substantiating its efficacy in improving the security of speaker verification systems against channel-induced vulnerabilities.
KW - Audio Anti-Spoofing
KW - Channel Robustness
KW - Graph Neural Networks
KW - Speaker Verification
UR - https://www.scopus.com/pages/publications/85208432915
U2 - 10.1007/978-3-031-75764-8_7
DO - 10.1007/978-3-031-75764-8_7
M3 - 会议稿件
AN - SCOPUS:85208432915
SN - 9783031757631
T3 - Lecture Notes in Computer Science
SP - 121
EP - 139
BT - Information Security - 27th International Conference, ISC 2024, Proceedings
A2 - Mouha, Nicky
A2 - Nikiforakis, Nick
PB - Springer Science and Business Media Deutschland GmbH
T2 - 27th Information Security Conference, ISC 2024
Y2 - 23 October 2024 through 25 October 2024
ER -