跳到主要导航 跳到搜索 跳到主要内容

Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding

  • Jiabo Ye
  • , Junfeng Tian
  • , Ming Yan
  • , Xiaoshan Yang
  • , Xuwu Wang
  • , Ji Zhang
  • , Liang He
  • , Xin Lin
  • East China Normal University
  • Alibaba Group Holding Ltd.
  • CASIA
  • Fudan University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Visual grounding focuses on establishing fine-grained alignment between vision and natural language, which has essential applications in multimodal reasoning systems. Existing methods use pre-trained query-agnostic visual backbones to extract visual feature maps independently without considering the query information. We argue that the visual features extracted from the visual backbones and the features really needed for multimodal reasoning are inconsistent. One reason is that there are differences between pre-training tasks and visual grounding. Moreover, since the backbones are query-agnostic, it is difficult to completely avoid the inconsistency issue by training the visual backbone end-to-end in the visual grounding framework. In this paper, we propose a Query-modulated Refinement Network (QRNet) to address the inconsistent issue by adjusting intermediate features in the visual backbone with a novel Query-aware Dynamic Attention (QD-ATT) mechanism and query-aware multiscale fusion. The QD-ATT can dynamically compute query-dependent visual attention at the spatial and channel levels of the feature maps produced by the visual backbone. We apply the QRNet to an end-to-end visual grounding framework. Extensive experiments show that the proposed method outperforms state-of-the-art methods on five widely used datasets. Our code is available at https://github.com/LukeForeverYoung/QRNet.

源语言英语
主期刊名Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
出版商IEEE Computer Society
15481-15491
页数11
ISBN(电子版)9781665469463
DOI
出版状态已出版 - 2022
活动2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, 美国
期限: 19 6月 202224 6月 2022

出版系列

姓名Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
2022-June
ISSN(印刷版)1063-6919

会议

会议2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
国家/地区美国
New Orleans
时期19/06/2224/06/22

指纹

探究 'Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding' 的科研主题。它们共同构成独一无二的指纹。

引用此