TY - GEN
T1 - FEATURE-CONSTRAINED AND ATTENTION-CONDITIONED DISTILLATION LEARNING FOR VISUAL ANOMALY DETECTION
AU - Zhang, Shuo
AU - Liu, Jing
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Visual anomaly detection in computer vision is an essential one-class classification and segmentation problem. The student-teacher (S-T) approach has proven effective in addressing this challenge. However, previous studies based on S-T underutilize the feature representations learned by the teacher network, which restricts anomaly detection performance. In this study, we propose a novel feature-constrained and attention-conditioned distillation learning method for visual anomaly detection with localization, which fully uses the features of the teacher model and the local semantics of the critical structure to instruct the student model to detect anomalies efficiently. Specifically, we introduce the Vision Transformer (ViT) as the backbone for anomaly detection tasks, and the central feature strategy and self-attention masking strategy are proposed to constrain the output features and impose agreement between multi-image views. It improves the ability of the student network to describe normal data features and widens the feature difference between the student and teacher networks for abnormal data. Experiments on the benchmark datasets demonstrate that the proposed method significantly improves the performance of visual anomaly detection compared with the competing methods.
AB - Visual anomaly detection in computer vision is an essential one-class classification and segmentation problem. The student-teacher (S-T) approach has proven effective in addressing this challenge. However, previous studies based on S-T underutilize the feature representations learned by the teacher network, which restricts anomaly detection performance. In this study, we propose a novel feature-constrained and attention-conditioned distillation learning method for visual anomaly detection with localization, which fully uses the features of the teacher model and the local semantics of the critical structure to instruct the student model to detect anomalies efficiently. Specifically, we introduce the Vision Transformer (ViT) as the backbone for anomaly detection tasks, and the central feature strategy and self-attention masking strategy are proposed to constrain the output features and impose agreement between multi-image views. It improves the ability of the student network to describe normal data features and widens the feature difference between the student and teacher networks for abnormal data. Experiments on the benchmark datasets demonstrate that the proposed method significantly improves the performance of visual anomaly detection compared with the competing methods.
KW - Anomaly detection
KW - attention masking
KW - consistency constraint
KW - feature distillation
UR - https://www.scopus.com/pages/publications/85195371818
U2 - 10.1109/ICASSP48485.2024.10448432
DO - 10.1109/ICASSP48485.2024.10448432
M3 - 会议稿件
AN - SCOPUS:85195371818
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 2945
EP - 2949
BT - 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Y2 - 14 April 2024 through 19 April 2024
ER -