CM-PGD: Adversarial Attacks by Concept-Based Explainable AI

  • Shengkai Xu
  • , Min Zhang*
  • , Jiangtao Wang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In recent years, Explainable AI (XAI) has become increasingly critical, and research on XAI has progressed in various aspects, such as Learned Feature, Pixel Attribution, and Concept Detection in image classification. Among them, concept-based explanation is one of the major methods to understand the predictions by a neural network. In this paper, we combine concept-based explanation and adversarial attack together in order to optimize the process of direct adversarial attack. We first propose Concept Mask, which offers a way to explain the decisions of a model. Then, we propose a Concept Masked Projected Gradient Descent (CM-PGD) based attack, which can generate adversarial examples with fewer total pixel change size and fewer changed pixel count. Compared with Projected Gradient Descent (PGD) attack, CM-PGD can reduce 20%–50% total pixel change size and 20%–40% changed pixel count on average.

Original languageEnglish
Title of host publicationProceedings of International Conference on Image, Vision and Intelligent Systems, ICIVIS 2023
EditorsPeng You, Shuaiqi Liu, Jun Wang
PublisherSpringer Science and Business Media Deutschland GmbH
Pages635-649
Number of pages15
ISBN (Print)9789819708543
DOIs
StatePublished - 2024
EventInternational Conference on Image, Vision and Intelligent Systems, ICIVIS 2023 - Baoding, China
Duration: 16 Aug 202318 Aug 2023

Publication series

NameLecture Notes in Electrical Engineering
Volume1163 LNEE
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

ConferenceInternational Conference on Image, Vision and Intelligent Systems, ICIVIS 2023
Country/TerritoryChina
CityBaoding
Period16/08/2318/08/23

Keywords

  • Adversarial Attack
  • Deep Learning
  • Explainable AI

Fingerprint

Dive into the research topics of 'CM-PGD: Adversarial Attacks by Concept-Based Explainable AI'. Together they form a unique fingerprint.

Cite this