跳到主要导航 跳到搜索 跳到主要内容

SampDetox: Black-box Backdoor Defense via Perturbation-based Sample Detoxification

  • Yanxin Yang
  • , Chentao Jia
  • , Deng Ke Yan
  • , Ming Hu*
  • , Tianlin Li
  • , Xiaofei Xie
  • , Xian Wei
  • , Mingsong Chen*
  • *此作品的通讯作者

科研成果: 期刊稿件会议文章同行评审

摘要

The advancement of Machine Learning has enabled the widespread deployment of Machine Learning as a Service (MLaaS) applications. However, the untrustworthy nature of third-party ML services poses backdoor threats. Existing defenses in MLaaS are limited by their reliance on training samples or white-box model analysis, highlighting the need for a black-box backdoor purification method. In our paper, we attempt to use diffusion models for purification by introducing noise in a forward diffusion process to destroy backdoors and recover clean samples through a reverse generative process. However, since a higher noise also destroys the semantics of the original samples, it still results in a low restoration performance. To investigate the effectiveness of noise in eliminating different types of backdoors, we conducted a preliminary study, which demonstrates that backdoors with low visibility can be easily destroyed by lightweight noise and those with high visibility need to be destroyed by high noise but can be easily detected. Based on the study, we propose SampDetox, which strategically combines lightweight and high noise. SampDetox applies weak noise to eliminate low-visibility backdoors and compares the structural similarity between the recovered and original samples to localize high-visibility backdoors. Intensive noise is then applied to these localized areas, destroying the high-visibility backdoors while preserving global semantic information. As a result, detoxified samples can be used for inference even by poisoned models. Comprehensive experiments demonstrate the effectiveness of SampDetox in defending against various state-of-the-art backdoor attacks. The source code of this work is publicly available at https://github.com/easywood0204/SampDetox.

源语言英语
期刊Advances in Neural Information Processing Systems
37
出版状态已出版 - 2024
活动38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, 加拿大
期限: 9 12月 202415 12月 2024

指纹

探究 'SampDetox: Black-box Backdoor Defense via Perturbation-based Sample Detoxification' 的科研主题。它们共同构成独一无二的指纹。

引用此