Query Graph Attention for Video Relation Detection

  • Jian Wang*
  • , Haibin Cai
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As a bridge to connect vision and language, visual relations between objects, visual relation provide a more comprehensive visual content understanding beyond objects. Most previous works adopt the track-to-detect framework for video visual relation detection (VidVRD), which cannot capture long-term spatio- temporal contexts in different stages and also suffers from inefficiency. In this work, we propose a query-based method for video visual relation detection. Our model exploits graph structure to autoregressively generate relation graphs with spatio-temporal contexts and uses an attentional graph convolutional network to fuse the contexts. Experiments on benchmark datasets ImageNet-VidVRD demonstrate the accuracy of our method.

Original languageEnglish
Title of host publicationInternational Conference on Image, Signal Processing, and Pattern Recognition, ISPP 2023
EditorsPaulo Batista, Ram Bilas Pachori
PublisherSPIE
ISBN (Electronic)9781510666351
DOIs
StatePublished - 2023
Event2023 International Conference on Image, Signal Processing, and Pattern Recognition, ISPP 2023 - Changsha, China
Duration: 24 Feb 202326 Feb 2023

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume12707
ISSN (Print)0277-786X
ISSN (Electronic)1996-756X

Conference

Conference2023 International Conference on Image, Signal Processing, and Pattern Recognition, ISPP 2023
Country/TerritoryChina
CityChangsha
Period24/02/2326/02/23

Keywords

  • Graph convolutional network
  • Transformer
  • Video relation detection

Fingerprint

Dive into the research topics of 'Query Graph Attention for Video Relation Detection'. Together they form a unique fingerprint.

Cite this