Word spotting in Chinese document images without layout analysis

Lu Yue, Lim Tan Chew

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

An approach to searching user-specified words/phrases in Chinese document images, without the requirements of layout analysis, is proposed in this paper. Bounding boxes of Chinese character images are first determined using connected component analysis. Next, a suitable character from the user-specified word/phrase is chosen as the initial character to search for a matching candidate in the document. Once a matched candidate is found, its adjacent characters in the horizontal and vertical directions are examined for matching with other corresponding characters in the user-specified word/phrase, subject to the constraints of positional relation and size similarity The character matching is done in two stages. The coarse matching is carried out based on the stroke density features. A weighted Hausdorff distance(WHD) is proposed for the second matching phase. Experimental results show that the proposed method can effectively search the user-specified Chinese word/phrase from horizontal or vertical text lines of document images.

Original languageEnglish
Pages (from-to)57-60
Number of pages4
JournalProceedings - International Conference on Pattern Recognition
Volume16
Issue number3
StatePublished - 2002
Externally publishedYes

Fingerprint

Dive into the research topics of 'Word spotting in Chinese document images without layout analysis'. Together they form a unique fingerprint.

Cite this