End-to-end learning of representations for instance-level document image retrieval

  • Li Liu*
  • , Yue Lu
  • , Ching Y. Suen
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Instance-level document image retrieval plays a vital role in many document image processing systems. An appropriate image representation is of paramount importance for effective retrieval. To this end, we propose an image representation that is well-suited for the instance-level document image retrieval task. A novel end-to-end three-stream Siamese network is presented to learn the image representation, which accepts a triplet: a query image, its matching image and its non-matching image. The network is trained to jointly minimize two types of loss: ranking loss and classification loss. By employing the ranking loss, the distance between the representations of the query image and its matching image can be explicitly forced to be smaller than that between the query image and its non-matching image. Besides, each stream of the network is further extended as a classification model to fully exploit the supervised information of each individual image. The cross-entropy loss is then employed for the classification model. After training, an arbitrary image can be fed to either stream of the network to generate its representation. Extensive comparison and ablation experiments on three datasets have demonstrated the effectiveness of the proposed image representation. The two types of loss have been shown to complement each other.

Original languageEnglish
Article number110136
JournalApplied Soft Computing
Volume136
DOIs
StatePublished - Mar 2023

Keywords

  • Classification loss
  • Image representation
  • Instance-level document image retrieval
  • Ranking loss
  • Three-stream siamese network

Fingerprint

Dive into the research topics of 'End-to-end learning of representations for instance-level document image retrieval'. Together they form a unique fingerprint.

Cite this