Near-duplicate document image matching: A graphical perspective

Li Liu, Yue Lu*, Ching Y. Suen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

A near-duplicate document image matching approach characterized by a graphical perspective is proposed in this paper. Document images are represented by graphs whose nodes correspond to the objects in the images. Consequently, the image matching problem is then converted to graph matching. To deal with the instability of object segmentation, a multi-granularity object tree is constructed for a document image. Each level in the tree corresponds to one possible object segmentation, while different levels are characterized by various object granularities. Some graphs can be generated from the tree and the objects associated with each graph may be of different granularities. Two graphs with the maximum similarity are found from the multi-granularity object trees of the two near-duplicate document images which are to be matched. The encouraging experimental results have demonstrated the effectiveness of the proposed approach.

Original languageEnglish
Pages (from-to)1653-1663
Number of pages11
JournalPattern Recognition
Volume47
Issue number4
DOIs
StatePublished - Apr 2014

Keywords

  • Document image matching
  • Document images
  • Graph matching
  • Graph representation
  • Multi-granularity object tree
  • Near-duplicate documents

Fingerprint

Dive into the research topics of 'Near-duplicate document image matching: A graphical perspective'. Together they form a unique fingerprint.

Cite this