Abstract
Anomaly detection is a critical task in industrial manufacturing, and leveraging artificial intelligence to identify product anomalies is essential for significantly enhancing production efficiency. However, most existing approaches follow the one-to-one paradigm, where a customized model is trained for each category, incurring substantial computational and memory costs. Although some methods have emerged for universal anomaly detection in recent years, they usually require carefully designed text prompts or have slow inference speeds. Moreover, most anomaly detection methods lack the capability of fine-grained anomaly classification, necessitating additional training of classification models for practical applications with different categories. To address these challenges, we propose YOLOSAM, a unified and efficient anomaly detection model based on auto mask prompt. YOLOSAM is a dual-branch architecture that can handle multi-class few-shot anomaly detection with a unified model, including both segmentation and classification branches. In the segmentation branch, we design an auto mask prompt generator that generates mask prompts directly from visual information, eliminating the need for complex prompt engineering. In the detection branch, we design a defect detection head that utilizes the visual information to achieve fine-grained anomaly classification. Additionally, we employed knowledge distillation techniques to compress the image encoder, and both branches share this distilled encoder, effectively preserving SAM’s general knowledge while significantly enhancing the inference speed. YOLOSAM achieved anomaly classification and segmentation results of 95.6%/96.8% AUROC on the MVTec-AD dataset and 90.2%/97.6% AUROC on the VisA dataset under multi-class and 4-shot settings. The model achieves an inference speed of 46 ms per image, approximately 3 times faster than SOTA methods.
| Original language | English |
|---|---|
| Article number | 1055 |
| Journal | Signal, Image and Video Processing |
| Volume | 19 |
| Issue number | 12 |
| DOIs | |
| State | Published - Dec 2025 |
Keywords
- Automatic prompt
- Knowledge distill
- Lightweight model
- SAM
- Unified anomaly detection
Fingerprint
Dive into the research topics of 'YOLOSAM: A unified and efficient anomaly detection model based on auto mask prompt'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver