SGFL-Attack: A Similarity-Guidance Strategy for Hard-Label Textual Adversarial Attack Based on Feedback Learning

  • Panjia Qiu
  • , Guanghao Zhou
  • , Mingyuan Fan
  • , Cen Chen*
  • , Yaliang Li
  • , Wenming Zhou
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Hard-label black-box textual adversarial attack presents a challenging task where only the predictions of the victim model are available. Moreover, several constraints further complicate the task of launching such attacks, including the inherent discrete and non-differentiable nature of text data and the need to introduce subtle perturbations that remain imperceptible to humans while preserving semantic similarity. Despite the considerable research efforts dedicated to this problem, existing methods still suffer from several limitations. For example, algorithms based on complex heuristic searches necessitate extensive querying, rendering them computationally expensive. The introduction of continuous gradient strategies into discrete text spaces often leads to estimation errors. Meanwhile, geometry-based strategies are prone to falling into local optima. To address these limitations, in this paper, we introduce SGFL-Attack, a novel approach that leverages a <u>S</u>imilarity-<u>G</u>uidance strategy based on <u>F</u>eedback <u>L</u>earning for hard-label textual adversarial attack, with limited query budget. Specifically, the proposed SGFL-Attack utilizes word embedding vectors to assess the importance of words and positions in text sequences, and employs a feedback learning mechanism to determine reward or punishment based on changes in predicted labels caused by replacing words. In each iteration, SGFL-Attack guides the search based on knowledge acquired from the feedback learning mechanism, generating more similar samples while maintaining low perturbations. Moreover, to reduce the query budget, we incorporate local hash mapping to avoid redundant queries during the search process. Extensive experiments on seven widely used datasets show that the proposed SGFL-Attack method significantly outperforms state-of-the-art baselines and defenses over multiple language models.

Original languageEnglish
Title of host publicationCIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages1920-1929
Number of pages10
ISBN (Electronic)9798400704369
DOIs
StatePublished - 21 Oct 2024
Event33rd ACM International Conference on Information and Knowledge Management, CIKM 2024 - Boise, United States
Duration: 21 Oct 202425 Oct 2024

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
ISSN (Print)2155-0751

Conference

Conference33rd ACM International Conference on Information and Knowledge Management, CIKM 2024
Country/TerritoryUnited States
CityBoise
Period21/10/2425/10/24

Keywords

  • Adversarial examples
  • Black-box scenario
  • Deep neural networks
  • Security
  • Textual adversarial attack

Fingerprint

Dive into the research topics of 'SGFL-Attack: A Similarity-Guidance Strategy for Hard-Label Textual Adversarial Attack Based on Feedback Learning'. Together they form a unique fingerprint.

Cite this