TY - GEN
T1 - NEURAL NETWORK FRAGILE WATERMARKING WITH NO MODEL PERFORMANCE DEGRADATION
AU - Yin, Zhaoxia
AU - Yin, Heng
AU - Zhang, Xinpeng
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Deep neural networks are vulnerable to malicious fine-tuning attacks such as data poisoning and backdoor attacks. Therefore, in recent research, it is proposed how to detect malicious fine-tuning of neural network models. However, it usually negatively affects the performance of the protected model. Thus, we propose a novel neural network fragile watermarking with no model performance degradation. In the process of watermarking, we train a generative model with the specific loss function and secret key to generate triggers that are sensitive to the fine-tuning of the target classifier. In the process of verifying, we adopt the watermarked classifier to get labels of each fragile trigger. Then, malicious fine-tuning can be detected by comparing secret keys and labels. Experiments on classic datasets and classifiers show that the proposed method can effectively detect model malicious fine-tuning with no model performance degradation.
AB - Deep neural networks are vulnerable to malicious fine-tuning attacks such as data poisoning and backdoor attacks. Therefore, in recent research, it is proposed how to detect malicious fine-tuning of neural network models. However, it usually negatively affects the performance of the protected model. Thus, we propose a novel neural network fragile watermarking with no model performance degradation. In the process of watermarking, we train a generative model with the specific loss function and secret key to generate triggers that are sensitive to the fine-tuning of the target classifier. In the process of verifying, we adopt the watermarked classifier to get labels of each fragile trigger. Then, malicious fine-tuning can be detected by comparing secret keys and labels. Experiments on classic datasets and classifiers show that the proposed method can effectively detect model malicious fine-tuning with no model performance degradation.
KW - Backdoor attack
KW - Fragile watermarking
KW - Malicious tuning detection
KW - Model integrity protection
KW - Neural network
UR - https://www.scopus.com/pages/publications/85146623390
U2 - 10.1109/ICIP46576.2022.9897413
DO - 10.1109/ICIP46576.2022.9897413
M3 - 会议稿件
AN - SCOPUS:85146623390
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 3958
EP - 3962
BT - 2022 IEEE International Conference on Image Processing, ICIP 2022 - Proceedings
PB - IEEE Computer Society
T2 - 29th IEEE International Conference on Image Processing, ICIP 2022
Y2 - 16 October 2022 through 19 October 2022
ER -