Abstract
The issue of near duplicate document image retrieval is addressed in this paper, which is characterized by not only encoding each individual word in the image but also modeling its local spatial configuration. On representing each word in the image as a string in terms of its shape characteristics, a lexicon is first learnt from a training set. Then a word in an arbitrary document image can be soft assigned to a weighted combination of several nearest neighbors in the lexicon. The rationale behind soft-assignment is to tolerate the distortions induced by character segmentations which are error-prone in degraded document images. Most importantly, we look beyond the single word and capture the local spatial configuration for each word which plays a very important role in human perception. It provides much useful information in discriminating between different document images compared with the single word. A graph, benefitting from its great representative power, is built for each word to model its relationships with the neighborhoods locally. The local word spatial configurations are integrated within the inverted file index structure to achieve scalable retrieval. Thus the retrieval of near duplicate document images is formulated as a voting problem. Experimental results on 45,000 document images demonstrate that the proposed approach brings significant improvements in successful retrieval of near duplicate images.
| Original language | English |
|---|---|
| Article number | 6628619 |
| Pages (from-to) | 235-239 |
| Number of pages | 5 |
| Journal | Proceedings of the International Conference on Document Analysis and Recognition, ICDAR |
| DOIs | |
| State | Published - 2013 |
| Event | 12th International Conference on Document Analysis and Recognition, ICDAR 2013 - Washington, DC, United States Duration: 25 Aug 2013 → 28 Aug 2013 |
Keywords
- inverted file indexing
- local word spatial configurations
- near duplicate document image retrieval
- word soft-assignment