Steganalysis of synonym-substitution based natural language watermarking

  • Zhenshan Yu*
  • , Liusheng Huang
  • , Zhili Chen
  • , Lingjun Li
  • , Xinxin Zhao
  • , Youwen Zhu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Natural language watermarking (NLW) is a kind of digital rights management (DRM) techniques specially designed for natural language documents. Watermarking algorithms based on synonym substitution are the most popular kind, they embeds watermark into documents in linguistic meaning-preserving ways. A lot of work has been done on embedding, but only a little on steganalysis such as detecting, destroying, and extracting the watermark. In this paper, we try to distinguish between watermarked articles and unwatermarked articles using context information. We evaluate the suitability of words for their context, and then the suitability sequence of words leads to the final judgment made by a SVM (support vector machine) classifier. IDF (inverse document frequency) is used to weight words' suitability in order to balance common words and rare ones. This scheme is evaluated on internet instead of in a specific corpus, with the help of Google. Experimental results show that classification accuracy achieves 90.0%. And further analysis of several influencing factors affecting detection effects is also presented.

Original languageEnglish
Pages (from-to)21-34
Number of pages14
JournalInternational Journal of Multimedia and Ubiquitous Engineering
Volume4
Issue number2
StatePublished - 2009
Externally publishedYes

Fingerprint

Dive into the research topics of 'Steganalysis of synonym-substitution based natural language watermarking'. Together they form a unique fingerprint.

Cite this