跳到主要导航 跳到搜索 跳到主要内容

SQT: Debiased Visual Question Answering via Shuffling Question Types

  • Tianyu Huai
  • , Shuwen Yang
  • , Junhang Zhang
  • , Guoan Wang
  • , Xinru Yu
  • , Tianlong Ma*
  • , Liang He
  • *此作品的通讯作者
  • East China Normal University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Visual Question Answering (VQA) aims to obtain answers through image-question pairs. Nowadays, the VQA model tends to get answers only through questions, ignoring the information in the images. This phenomenon is caused by bias. As indicated by previous studies, the bias in VQA mainly comes from text modality. Our analysis of bias suggests that the question type is a crucial factor in bias formation. To interrupt the shortcut from question type to answer for de-biasing, we propose a self-supervised method for Shuffling Question Types (SQT) to reduce bias from text modality, which overcomes the prior language problem by mitigating the question-to-answer bias without introducing external annotations. Moreover, we propose a new objective function for negative samples. Experimental results show that our approach can achieve 61.76% accuracy on the VQA-CP v2 dataset, which outperforms the state-of-the-art in both self-supervised and supervised methods.

源语言英语
主期刊名Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
出版商IEEE Computer Society
600-605
页数6
ISBN(电子版)9781665468916
DOI
出版状态已出版 - 2023
活动2023 IEEE International Conference on Multimedia and Expo, ICME 2023 - Brisbane, 澳大利亚
期限: 10 7月 202314 7月 2023

出版系列

姓名Proceedings - IEEE International Conference on Multimedia and Expo
2023-July
ISSN(印刷版)1945-7871
ISSN(电子版)1945-788X

会议

会议2023 IEEE International Conference on Multimedia and Expo, ICME 2023
国家/地区澳大利亚
Brisbane
时期10/07/2314/07/23

指纹

探究 'SQT: Debiased Visual Question Answering via Shuffling Question Types' 的科研主题。它们共同构成独一无二的指纹。

引用此