跳到主要导航 跳到搜索 跳到主要内容

One-Stage Visual Grounding via Semantic-Aware Feature Filter

  • East China Normal University
  • Shanghai Key Laboratory of Multidimensional Information Processing

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Visual grounding has attracted much attention with the popularity of vision language. Existing one-stage methods are far ahead of two-stage methods in speed. However, these methods fuse the textual feature and visual feature map by simply concatenation, which ignores the textual semantics and limits these models' ability in cross-modal understanding. To overcome this weakness, we propose a semantic-aware framework that utilizes both queries' structured knowledge and context-sensitive representations to filter the visual feature maps to localize the referents more accurately. Our framework contains an entity filter, an attribute filter, and a location filter. These three filters filter the input visual feature map step by step according to each query's aspects respectively. A grounding module further regresses the bounding boxes to localize the referential object. Experiments on various commonly used datasets show that our framework achieves a real-time inference speed and outperforms all state-of-the-art methods.

源语言英语
主期刊名MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
出版商Association for Computing Machinery, Inc
1702-1711
页数10
ISBN(电子版)9781450386517
DOI
出版状态已出版 - 17 10月 2021
活动29th ACM International Conference on Multimedia, MM 2021 - Virtual, Online, 中国
期限: 20 10月 202124 10月 2021

出版系列

姓名MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

会议

会议29th ACM International Conference on Multimedia, MM 2021
国家/地区中国
Virtual, Online
时期20/10/2124/10/21

指纹

探究 'One-Stage Visual Grounding via Semantic-Aware Feature Filter' 的科研主题。它们共同构成独一无二的指纹。

引用此