A Web-based System for Retrieving Document Images from Digital Library

  • Li Zhang
  • , Yue Lu
  • , Chew Lim Tan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

A web-based system for retrieving imaged documents from a digital library is described in this paper. First, some image preprocessing is performed off-line on the underlying imaged document to extract its word objects. Then, each word object is represented by a string known as its feature code, based on which a feature code file of the corresponding document is constructed. On the web interface side, the system allows the user to input a set of query words and indicate either to perform "AND" or "OR" operation on them. Once receiving user's request, the system will process each query word and combine the results based on the "AND" or "OR" operation the user has chosen. As for each query word, it is first looked up in an index table that stores words being queried before. If matches are found, results will be retrieved from the index table directly and stored temporarily for subsequent merging. This speeds up searching and makes the system an incremental intelligence system. Otherwise, the system will convert the query word to a feature code string and employ a partial word matching approach to perform search on the pre-generated feature code files. Preliminary experimental results with the imaged documents of students' theses provided by our digital library show that the proposed system is efficient and promising for document image retrieval, and thus has potential applications to digital libraries.

Original languageEnglish
Title of host publication2003 Conference on Computer Vision and Pattern Recognition Workshop, CVPRW 2003
PublisherIEEE Computer Society
Pages27-34
Number of pages8
ISBN (Electronic)0769519008
DOIs
StatePublished - 2003
Externally publishedYes
EventConference on Computer Vision and Pattern Recognition Workshop, CVPRW 2003 - Madison, United States
Duration: 16 Jun 200322 Jun 2003

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Volume3
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

ConferenceConference on Computer Vision and Pattern Recognition Workshop, CVPRW 2003
Country/TerritoryUnited States
CityMadison
Period16/06/0322/06/03

Fingerprint

Dive into the research topics of 'A Web-based System for Retrieving Document Images from Digital Library'. Together they form a unique fingerprint.

Cite this