Abstract
An approach to searching user-specified words/phrases in Chinese document images, without the requirements of layout analysis, is proposed in this paper. Bounding boxes of Chinese character images are first determined using connected component analysis. Next, a suitable character from the user-specified word/phrase is chosen as the initial character to search for a matching candidate in the document. Once a matched candidate is found, its adjacent characters in the horizontal and vertical directions are examined for matching with other corresponding characters in the user-specified word/phrase, subject to the constraints of positional relation and size similarity The character matching is done in two stages. The coarse matching is carried out based on the stroke density features. A weighted Hausdorff distance(WHD) is proposed for the second matching phase. Experimental results show that the proposed method can effectively search the user-specified Chinese word/phrase from horizontal or vertical text lines of document images.
| Original language | English |
|---|---|
| Pages (from-to) | 57-60 |
| Number of pages | 4 |
| Journal | Proceedings - International Conference on Pattern Recognition |
| Volume | 16 |
| Issue number | 3 |
| State | Published - 2002 |
| Externally published | Yes |