Detect and Remove Watermark in Deep Neural Networks via Generative Adversarial Networks

Shichang Sun, Haoqi Wang, Mingfu Xue, Yushu Zhang, Jian Wang, Weiqiang Liu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Deep neural networks (DNN) have achieved remarkable performance in various fields. However, training a DNN model from scratch requires expensive computing resources and a lot of training data, which are difficult to obtain for most individual users. To this end, intellectual property (IP) infringement of deep learning models is an emerging problem in recent years. Pre-trained models may be stolen or abused by illegal users without the permission of the model owner. Recently, many works have been proposed to protect the intellectual property of DNN models. Among these works, embedding watermarks into DNN based on backdoor is one of the widely used methods. However, the backdoor-based watermark faces the risk of being detected or removed by an adversary. In this paper, we propose a scheme to detect and remove backdoor-based watermark in deep neural networks via generative adversarial networks (GAN). The proposed attack method consists of two phases. In the first phase, we use the GAN and few clean images to detect the watermarked class and reverse the watermark trigger in a DNN model. In the second phase, we fine-tune the watermarked DNN with the reversed backdoor images to remove the backdoor watermark. Experimental results on the MNIST and CIFAR-10 datasets demonstrate that, the proposed method can effectively remove watermarks in DNN models, as the watermark retention rates of the watermarked LeNet-5 and ResNet-18 models reduce from 99.99% to 1.2% and from 99.99% to 1.4%, respectively. Meanwhile, the proposed attack only introduces a very slight influence on the performance of the DNN model. The test accuracy of the watermarked DNN on the MNIST and CIFAR-10 datasets drops by only 0.77% and 2.67%, respectively. Compared with existing watermark removal works, the proposed attack can successfully remove the backdoor-based DNN watermarking with fewer data, and can reverse the watermark trigger and the watermark class from the DNN model.

Original languageEnglish
Title of host publicationInformation Security - 24th International Conference, ISC 2021, Proceedings
EditorsJoseph K. Liu, Sokratis Katsikas, Weizhi Meng, Willy Susilo, Rolly Intan
PublisherSpringer Science and Business Media Deutschland GmbH
Pages341-357
Number of pages17
ISBN (Print)9783030913557
DOIs
StatePublished - 2021
Externally publishedYes
Event24th International Conference on Information Security, ISC 2021 - Virtual, Online
Duration: 10 Nov 202112 Nov 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13118 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International Conference on Information Security, ISC 2021
CityVirtual, Online
Period10/11/2112/11/21

Keywords

  • Deep neural networks
  • Fine-tuning
  • Generative adversarial networks
  • Intellectual property protection
  • Watermark removal

Fingerprint

Dive into the research topics of 'Detect and Remove Watermark in Deep Neural Networks via Generative Adversarial Networks'. Together they form a unique fingerprint.

Cite this