SQT: Debiased Visual Question Answering via Shuffling Question Types

  • Tianyu Huai
  • , Shuwen Yang
  • , Junhang Zhang
  • , Guoan Wang
  • , Xinru Yu
  • , Tianlong Ma*
  • , Liang He
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Visual Question Answering (VQA) aims to obtain answers through image-question pairs. Nowadays, the VQA model tends to get answers only through questions, ignoring the information in the images. This phenomenon is caused by bias. As indicated by previous studies, the bias in VQA mainly comes from text modality. Our analysis of bias suggests that the question type is a crucial factor in bias formation. To interrupt the shortcut from question type to answer for de-biasing, we propose a self-supervised method for Shuffling Question Types (SQT) to reduce bias from text modality, which overcomes the prior language problem by mitigating the question-to-answer bias without introducing external annotations. Moreover, we propose a new objective function for negative samples. Experimental results show that our approach can achieve 61.76% accuracy on the VQA-CP v2 dataset, which outperforms the state-of-the-art in both self-supervised and supervised methods.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
PublisherIEEE Computer Society
Pages600-605
Number of pages6
ISBN (Electronic)9781665468916
DOIs
StatePublished - 2023
Event2023 IEEE International Conference on Multimedia and Expo, ICME 2023 - Brisbane, Australia
Duration: 10 Jul 202314 Jul 2023

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2023-July
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Country/TerritoryAustralia
CityBrisbane
Period10/07/2314/07/23

Keywords

  • De-biasing
  • Self-supervised
  • Visual question answering

Fingerprint

Dive into the research topics of 'SQT: Debiased Visual Question Answering via Shuffling Question Types'. Together they form a unique fingerprint.

Cite this