Scene Graph Generation using Depth-based Multimodal Network

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Scene graph generation (SGG) provides an efficient way for scene understanding. However, it has been plagued by the inaccurate classification of relative spatial relationship and incorrect feature information aggregation from distant objects. In this paper, we innovatively introduce the depth information of objects into SGG and propose a multimodal edge-featured graph attention network (MEGA-Net). MEGA-Net primarily comprises three modules. First, the edge-aware message passing (EMP) module extracts multimodal features and fuses them as edge features in the graph network via a quadrilinear model. Multimodal features consist of depth features, visual features, spatial features, and linguistic features. The depth feature in EMP provides the relative spatial relationship among objects which prevents the tail spatial predicates from being recognized as the head predicates. Second, we propose a depth-based self-supervised graph attention (DSGAT) module to predict the correlation probability between object pairs. By encoding the depth ranking of different object pairs in 2D images, DSGAT learns more accurate directional attention to avoid unrelated neighbors. Third, we introduce a predicate aware loss (PA-Loss) to alleviate the feature redundancy problem caused by extra depth information. This is achieved by introducing semantic frequency information that reflects the priority between different types of relationships. Systematic experiments show that our method achieves state-of-the-art performance on two popular datasets, VG and VRD.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
PublisherIEEE Computer Society
Pages1139-1144
Number of pages6
ISBN (Electronic)9781665468916
DOIs
StatePublished - 2023
Event2023 IEEE International Conference on Multimedia and Expo, ICME 2023 - Brisbane, Australia
Duration: 10 Jul 202314 Jul 2023

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2023-July
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Country/TerritoryAustralia
CityBrisbane
Period10/07/2314/07/23

Keywords

  • Depth Information
  • Scene Graph Generation
  • Self-Supervised Graph Attention Network

Fingerprint

Dive into the research topics of 'Scene Graph Generation using Depth-based Multimodal Network'. Together they form a unique fingerprint.

Cite this