Visual Graph Reasoning Network

Dingbang Li, Xin Lin, Haibin Cai, Wenzhou Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Visual question answering (VQA) is a fundamental and challenging cross-modal task. This task requires the model to fully understand the image's content and reason out the answer based on the question. Existing VQA models understand visual content mainly based on bottom-up or grid features. However, both types of vision features have some drawbacks. The discreteness and independence of bottom-up features pre-vent models from adequately performing relational reasoning. Image segmentation by grid features leads to the fragmentation of meaningful visual regions, limiting the cross-modal alignment capability of the model. Therefore, we proposed a more flexible method called Visual Graph. It can connect different patches according to semantic similarity and spatial relevance to model the potential relationships and cluster the adjacent homologous patches. Based on the Visual Graph, we designed a Visual Graph Reasoning Network for VQA. We evaluated our model on GQA and VQA-v2. The experimental results show that our models can achieve excellent performance between single models.

Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728163277
DOIs
StatePublished - 2023
Event48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2023-June
ISSN (Print)1520-6149

Conference

Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Country/TerritoryGreece
CityRhodes Island
Period4/06/2310/06/23

Keywords

  • Cross-modal
  • Visual Graph
  • Visual Question Answering
  • Visual Reasoning

Fingerprint

Dive into the research topics of 'Visual Graph Reasoning Network'. Together they form a unique fingerprint.

Cite this