Causal Fusion of Convolutional Neural Network and Vision Transformer for Image Anomaly Detection and Localization

Shuo Zhang, Xiongpeng Hu, Jing Liu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

To address the challenge of visual anomaly detection amidst complex background interference. First, we construct a structural causal model for anomaly detection under complex background interference and propose an intervention strategy to block background feature interference. Then, we build an anomaly feature-sensitive neural network (AFSNN) containing two feature extraction modules based on the causal intervention strategy. Given the limitations of convolutional neural networks in capturing global features associated with spatial location dependence, and the substantial data requirements of vision transformers, we opt for the enhanced Swin Transformer module and the deformable convolutional networks encoder module to extract global features and local details, respectively. We also designed the cross-attention to fuse these two scales of feature representation. Finally, we introduce a causality-sensitive learning module that differentiates the outputs of the two feature extraction modules and constructs a causality-sensitive loss function by maximizing the output differences. This approach blocks background features and enhances sensitivity to anomaly features during training. Experiments show that AFSNN can effectively attenuate the confusing interference of the background pattern.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Multimedia and Expo, ICME 2024
PublisherIEEE Computer Society
ISBN (Electronic)9798350390155
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Multimedia and Expo, ICME 2024 - Niagra Falls, Canada
Duration: 15 Jul 202419 Jul 2024

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2024 IEEE International Conference on Multimedia and Expo, ICME 2024
Country/TerritoryCanada
CityNiagra Falls
Period15/07/2419/07/24

Keywords

  • Swin Transformer
  • anomaly detection
  • causal inference
  • cross-attention mechanism

Fingerprint

Dive into the research topics of 'Causal Fusion of Convolutional Neural Network and Vision Transformer for Image Anomaly Detection and Localization'. Together they form a unique fingerprint.

Cite this