TY - GEN
T1 - Detect and Remove Watermark in Deep Neural Networks via Generative Adversarial Networks
AU - Sun, Shichang
AU - Wang, Haoqi
AU - Xue, Mingfu
AU - Zhang, Yushu
AU - Wang, Jian
AU - Liu, Weiqiang
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Deep neural networks (DNN) have achieved remarkable performance in various fields. However, training a DNN model from scratch requires expensive computing resources and a lot of training data, which are difficult to obtain for most individual users. To this end, intellectual property (IP) infringement of deep learning models is an emerging problem in recent years. Pre-trained models may be stolen or abused by illegal users without the permission of the model owner. Recently, many works have been proposed to protect the intellectual property of DNN models. Among these works, embedding watermarks into DNN based on backdoor is one of the widely used methods. However, the backdoor-based watermark faces the risk of being detected or removed by an adversary. In this paper, we propose a scheme to detect and remove backdoor-based watermark in deep neural networks via generative adversarial networks (GAN). The proposed attack method consists of two phases. In the first phase, we use the GAN and few clean images to detect the watermarked class and reverse the watermark trigger in a DNN model. In the second phase, we fine-tune the watermarked DNN with the reversed backdoor images to remove the backdoor watermark. Experimental results on the MNIST and CIFAR-10 datasets demonstrate that, the proposed method can effectively remove watermarks in DNN models, as the watermark retention rates of the watermarked LeNet-5 and ResNet-18 models reduce from 99.99% to 1.2% and from 99.99% to 1.4%, respectively. Meanwhile, the proposed attack only introduces a very slight influence on the performance of the DNN model. The test accuracy of the watermarked DNN on the MNIST and CIFAR-10 datasets drops by only 0.77% and 2.67%, respectively. Compared with existing watermark removal works, the proposed attack can successfully remove the backdoor-based DNN watermarking with fewer data, and can reverse the watermark trigger and the watermark class from the DNN model.
AB - Deep neural networks (DNN) have achieved remarkable performance in various fields. However, training a DNN model from scratch requires expensive computing resources and a lot of training data, which are difficult to obtain for most individual users. To this end, intellectual property (IP) infringement of deep learning models is an emerging problem in recent years. Pre-trained models may be stolen or abused by illegal users without the permission of the model owner. Recently, many works have been proposed to protect the intellectual property of DNN models. Among these works, embedding watermarks into DNN based on backdoor is one of the widely used methods. However, the backdoor-based watermark faces the risk of being detected or removed by an adversary. In this paper, we propose a scheme to detect and remove backdoor-based watermark in deep neural networks via generative adversarial networks (GAN). The proposed attack method consists of two phases. In the first phase, we use the GAN and few clean images to detect the watermarked class and reverse the watermark trigger in a DNN model. In the second phase, we fine-tune the watermarked DNN with the reversed backdoor images to remove the backdoor watermark. Experimental results on the MNIST and CIFAR-10 datasets demonstrate that, the proposed method can effectively remove watermarks in DNN models, as the watermark retention rates of the watermarked LeNet-5 and ResNet-18 models reduce from 99.99% to 1.2% and from 99.99% to 1.4%, respectively. Meanwhile, the proposed attack only introduces a very slight influence on the performance of the DNN model. The test accuracy of the watermarked DNN on the MNIST and CIFAR-10 datasets drops by only 0.77% and 2.67%, respectively. Compared with existing watermark removal works, the proposed attack can successfully remove the backdoor-based DNN watermarking with fewer data, and can reverse the watermark trigger and the watermark class from the DNN model.
KW - Deep neural networks
KW - Fine-tuning
KW - Generative adversarial networks
KW - Intellectual property protection
KW - Watermark removal
UR - https://www.scopus.com/pages/publications/85121877625
U2 - 10.1007/978-3-030-91356-4_18
DO - 10.1007/978-3-030-91356-4_18
M3 - 会议稿件
AN - SCOPUS:85121877625
SN - 9783030913557
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 341
EP - 357
BT - Information Security - 24th International Conference, ISC 2021, Proceedings
A2 - Liu, Joseph K.
A2 - Katsikas, Sokratis
A2 - Meng, Weizhi
A2 - Susilo, Willy
A2 - Intan, Rolly
PB - Springer Science and Business Media Deutschland GmbH
T2 - 24th International Conference on Information Security, ISC 2021
Y2 - 10 November 2021 through 12 November 2021
ER -