TY - JOUR
T1 - End-to-end learning of representations for instance-level document image retrieval
AU - Liu, Li
AU - Lu, Yue
AU - Suen, Ching Y.
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/3
Y1 - 2023/3
N2 - Instance-level document image retrieval plays a vital role in many document image processing systems. An appropriate image representation is of paramount importance for effective retrieval. To this end, we propose an image representation that is well-suited for the instance-level document image retrieval task. A novel end-to-end three-stream Siamese network is presented to learn the image representation, which accepts a triplet: a query image, its matching image and its non-matching image. The network is trained to jointly minimize two types of loss: ranking loss and classification loss. By employing the ranking loss, the distance between the representations of the query image and its matching image can be explicitly forced to be smaller than that between the query image and its non-matching image. Besides, each stream of the network is further extended as a classification model to fully exploit the supervised information of each individual image. The cross-entropy loss is then employed for the classification model. After training, an arbitrary image can be fed to either stream of the network to generate its representation. Extensive comparison and ablation experiments on three datasets have demonstrated the effectiveness of the proposed image representation. The two types of loss have been shown to complement each other.
AB - Instance-level document image retrieval plays a vital role in many document image processing systems. An appropriate image representation is of paramount importance for effective retrieval. To this end, we propose an image representation that is well-suited for the instance-level document image retrieval task. A novel end-to-end three-stream Siamese network is presented to learn the image representation, which accepts a triplet: a query image, its matching image and its non-matching image. The network is trained to jointly minimize two types of loss: ranking loss and classification loss. By employing the ranking loss, the distance between the representations of the query image and its matching image can be explicitly forced to be smaller than that between the query image and its non-matching image. Besides, each stream of the network is further extended as a classification model to fully exploit the supervised information of each individual image. The cross-entropy loss is then employed for the classification model. After training, an arbitrary image can be fed to either stream of the network to generate its representation. Extensive comparison and ablation experiments on three datasets have demonstrated the effectiveness of the proposed image representation. The two types of loss have been shown to complement each other.
KW - Classification loss
KW - Image representation
KW - Instance-level document image retrieval
KW - Ranking loss
KW - Three-stream siamese network
UR - https://www.scopus.com/pages/publications/85148894940
U2 - 10.1016/j.asoc.2023.110136
DO - 10.1016/j.asoc.2023.110136
M3 - 文章
AN - SCOPUS:85148894940
SN - 1568-4946
VL - 136
JO - Applied Soft Computing
JF - Applied Soft Computing
M1 - 110136
ER -