TY - JOUR
T1 - Rethinking Reverse Distillation for Multi-Modal Anomaly Detection
AU - Gu, Zhihao
AU - Zhang, Jiangning
AU - Liu, Liang
AU - Chen, Xu
AU - Peng, Jinlong
AU - Gan, Zhenye
AU - Jiang, Guannan
AU - Shu, Annan
AU - Wang, Yabiao
AU - Ma, Lizhuang
N1 - Publisher Copyright:
Copyright © 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2024/3/25
Y1 - 2024/3/25
N2 - In recent years, there has been significant progress in employing color images for anomaly detection in industrial scenarios, but it is insufficient for identifying anomalies that are invisible in RGB images alone. As a supplement, introducing extra modalities such as depth and surface normal maps can be helpful to detect these anomalies. To this end, we present a novel Multi-Modal Reverse Distillation (MMRD) paradigm that consists of a frozen multi-modal teacher encoder to generate distillation targets and a learnable student decoder targeting to restore multi-modal representations from the teacher. Specifically, the teacher extracts complementary visual features from different modalities via a siamese architecture and then parameter-freely fuses these information from multiple levels as the targets of distillation. For the student, it learns modality-related priors from the teacher representations of normal training data and performs interaction between them to form multi-modal representations for target reconstruction. Extensive experiments show that our MMRD outperforms recent state-of-the-art methods on both anomaly detection and localization on MVTec-3D AD and Eyecandies benchmarks. Codes will be available upon acceptance.
AB - In recent years, there has been significant progress in employing color images for anomaly detection in industrial scenarios, but it is insufficient for identifying anomalies that are invisible in RGB images alone. As a supplement, introducing extra modalities such as depth and surface normal maps can be helpful to detect these anomalies. To this end, we present a novel Multi-Modal Reverse Distillation (MMRD) paradigm that consists of a frozen multi-modal teacher encoder to generate distillation targets and a learnable student decoder targeting to restore multi-modal representations from the teacher. Specifically, the teacher extracts complementary visual features from different modalities via a siamese architecture and then parameter-freely fuses these information from multiple levels as the targets of distillation. For the student, it learns modality-related priors from the teacher representations of normal training data and performs interaction between them to form multi-modal representations for target reconstruction. Extensive experiments show that our MMRD outperforms recent state-of-the-art methods on both anomaly detection and localization on MVTec-3D AD and Eyecandies benchmarks. Codes will be available upon acceptance.
UR - https://www.scopus.com/pages/publications/85189647055
U2 - 10.1609/aaai.v38i8.28687
DO - 10.1609/aaai.v38i8.28687
M3 - 会议文章
AN - SCOPUS:85189647055
SN - 2159-5399
VL - 38
SP - 8445
EP - 8453
JO - Proceedings of the AAAI Conference on Artificial Intelligence
JF - Proceedings of the AAAI Conference on Artificial Intelligence
IS - 8
T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024
Y2 - 20 February 2024 through 27 February 2024
ER -