An HMM-Based Algorithm for Similar Layout Document Image Retrieval

Jingwen Zhou, Ying Wen, Yue Lu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Document image retrieval is an important problem in image processing and is often a crucial step toward recognition and information extraction. In this paper, the problem is retrieving the document image having similar layout with query image. We propose a solution based on two algorithmic ideas. For unconstrained handwritten documents, the image is segmented into text lines using k-means clustering to build a minimum cost spanning tree (MST). Then, a hidden Markov model (HMM) is defined on the tree, the decoding sequence of which represents the layout structure. In this image retrieval system, the candidate images having the same decoding sequence with query image will be sorted by the Manhattan distance, and the nearest images are selected.

Original languageEnglish
Title of host publicationFoundations of Intelligent Systems - Proceedings of the 8th InternationalConference on Intelligent Systems and Knowledge Engineering, ISKE 2013
PublisherSpringer Verlag
Pages1077-1083
Number of pages7
ISBN (Print)9783642549236
DOIs
StatePublished - 2014
Event8th International Conference on Intelligent Systems and Knowledge Engineering, ISKE 2013 - Shenzhen, China
Duration: 20 Nov 201323 Nov 2013

Publication series

NameAdvances in Intelligent Systems and Computing
Volume277
ISSN (Print)2194-5357

Conference

Conference8th International Conference on Intelligent Systems and Knowledge Engineering, ISKE 2013
Country/TerritoryChina
CityShenzhen
Period20/11/1323/11/13

Keywords

  • HMM
  • Layout similarity
  • Text line segmentation

Fingerprint

Dive into the research topics of 'An HMM-Based Algorithm for Similar Layout Document Image Retrieval'. Together they form a unique fingerprint.

Cite this