Chinese word searching in imaged documents

Yue Lu, Chew Lim Tan

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

An approach to searching for user-specified words in imaged Chinese documents, without the requirements of layout analysis and OCR processing of the entire documents, is proposed in this paper. A small number of Chinese characters that cannot be successfully bounded using connected component analysis due to larger gaps between elements within the characters are blacklisted. A suitable character that is not included in the blacklist is chosen from the user-specified word as the initial character to search for a matching candidate in the document. Once a matched candidate is found, the adjacent characters in the horizontal and vertical directions are examined for matching with other corresponding characters in the user-specified word, subject to the constraints of alignment (either horizontal or vertical direction) and size similarity. A weighted Hausdorff distance is proposed for the character matching. Experimental results show that the present method can effectively search the user-specified Chinese words from the document images with the format of either horizontal or vertical text lines, or both appearing on the same image.

Original languageEnglish
Pages (from-to)229-246
Number of pages18
JournalInternational Journal of Pattern Recognition and Artificial Intelligence
Volume18
Issue number2
DOIs
StatePublished - Mar 2004
Externally publishedYes

Keywords

  • Character matching
  • Character segmentation
  • Chinese document image
  • Weighted Hausdorff distance
  • Word searching

Fingerprint

Dive into the research topics of 'Chinese word searching in imaged documents'. Together they form a unique fingerprint.

Cite this