TY - GEN
T1 - Robust Cross-Modal Retrieval by Adversarial Training
AU - Zhang, Tao
AU - Sun, Shiliang
AU - Zhao, Jing
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Cross-modal retrieval is usually implemented based on cross-modal representation learning, which is used to extract semantic information from cross-modal data. Recent work shows that cross-modal representation learning is vulnerable to adversarial attacks, even using large-scale pre-trained networks. By attacking the representation, it can be simple to attack the downstream tasks, especially for cross-modal retrieval tasks. Adversarial attacks on any modality will easily lead to obvious retrieval errors, which brings the challenge to improve the adversarial robustness of cross-modal retrieval. In this paper, we propose a robust cross-modal retrieval method (RoCMR), which generates adversarial examples for both the query modality and candidate modality and performs adversarial training for cross-modal retrieval. Specifically, we generate adversarial examples for both image and text modalities and train the model with benign and adversarial examples in the framework of contrastive learning. We evaluate the proposed RoCMR on two datasets and show its effectiveness in defending against gradient-based attacks.
AB - Cross-modal retrieval is usually implemented based on cross-modal representation learning, which is used to extract semantic information from cross-modal data. Recent work shows that cross-modal representation learning is vulnerable to adversarial attacks, even using large-scale pre-trained networks. By attacking the representation, it can be simple to attack the downstream tasks, especially for cross-modal retrieval tasks. Adversarial attacks on any modality will easily lead to obvious retrieval errors, which brings the challenge to improve the adversarial robustness of cross-modal retrieval. In this paper, we propose a robust cross-modal retrieval method (RoCMR), which generates adversarial examples for both the query modality and candidate modality and performs adversarial training for cross-modal retrieval. Specifically, we generate adversarial examples for both image and text modalities and train the model with benign and adversarial examples in the framework of contrastive learning. We evaluate the proposed RoCMR on two datasets and show its effectiveness in defending against gradient-based attacks.
UR - https://www.scopus.com/pages/publications/85140729536
U2 - 10.1109/IJCNN55064.2022.9892637
DO - 10.1109/IJCNN55064.2022.9892637
M3 - 会议稿件
AN - SCOPUS:85140729536
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2022 International Joint Conference on Neural Networks, IJCNN 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 International Joint Conference on Neural Networks, IJCNN 2022
Y2 - 18 July 2022 through 23 July 2022
ER -