Abstract
An approach with the capability of matching partial word image is proposed in this paper, to facilitate the issues of document image retrieval, such as detection of user-specified query words, and similarity measurement between documents. Each word image is represented by a feature string. Then, an inexact string matching technology is utilized to measure the similarity between the two feature strings generated from two word images, based on which we can estimate how one word image is relevant to the other one and thereby decide whether one is a portion of the other word. The approach is applied to two issues in the area of document information retrieval: word spotting and document similarity measurement. Experimental results on real document images show that it is a promising approach.
| Original language | English |
|---|---|
| Pages (from-to) | 379-387 |
| Number of pages | 9 |
| Journal | Proceedings of SPIE - The International Society for Optical Engineering |
| Volume | 4929 |
| DOIs | |
| State | Published - 16 Sep 2002 |
| Externally published | Yes |
| Event | Optical Information Processing Technology 2002 - Shanghai, China Duration: 14 Oct 2002 → 18 Oct 2002 |
Keywords
- Document image analysis
- Document similarity measurement
- Inexact matching
- Information retrieval
- Word image matching
- Word spotting