Beware of Overcorrection: Scene-induced Commonsense Graph for Scene Graph Generation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

A scene graph generation task is largely restricted under a class imbalance. Previous methods have alleviated the class imbalance problem by incorporating commonsense information into the classification, enabling the prediction model to rectify the incorrect head class into the correct tail class. However, the results of commonsense-based models are typically overcorrected, e.g., the visually correct head class is forcibly modified into the wrong tail class. We argue that there are two principal reasons for this phenomenon. First, existing models ignore the semantic gap between commonsense knowledge and real scenes. Second, current commonsense fusion strategies propagate the neighbors in the visual-linguistic contexts without long-range correlation. To alleviate overcorrection, we formulate the commonsense-based scene graph generation task as two sub-problems: scene-induced commonsense graph generation (SI-CGG) and commonsense-inspired scene graph generation (CI-SGG). In SI-CGG module, unlike conventional methods using fixed commonsense graph, we adaptively adjust the node embeddings in a commonsense graph according to their visual appearance and configure the new reasoning edge under a specific visual context. The CI-SGG module is proposed to propagate the information from scene-induced commonsense graph back to the scene graph. It updates the representations of each node in scene graph by the aggregation of neighbourhood information at different scales. Through maximum likelihood optimisation of the logarithmic Gaussian process, the scene graph automatically adapt to the different neighbors in the visual-linguistic contexts. Systematic experiments on the Visual Genome dataset show that our full method achieves state-of-the-art performance.

Original languageEnglish
Title of host publicationMM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages2888-2897
Number of pages10
ISBN (Electronic)9798400701085
DOIs
StatePublished - 27 Oct 2023
Event31st ACM International Conference on Multimedia, MM 2023 - Ottawa, Canada
Duration: 29 Oct 20233 Nov 2023

Publication series

NameMM 2023 - Proceedings of the 31st ACM International Conference on Multimedia

Conference

Conference31st ACM International Conference on Multimedia, MM 2023
Country/TerritoryCanada
CityOttawa
Period29/10/233/11/23

Keywords

  • overcorrection
  • scene graph generation
  • scene-induced commonsense graph
  • visual-linguistic context

Fingerprint

Dive into the research topics of 'Beware of Overcorrection: Scene-induced Commonsense Graph for Scene Graph Generation'. Together they form a unique fingerprint.

Cite this