TY - JOUR
T1 - Debiased Visual Question Answering via the perspective of question types
AU - Huai, Tianyu
AU - Yang, Shuwen
AU - Zhang, Junhang
AU - Zhao, Jiabao
AU - He, Liang
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/2
Y1 - 2024/2
N2 - Visual Question Answering (VQA) aims to answer questions according to the given image. However, current VQA models tend to rely solely on textual information from the questions and ignore the visual information in the images to get answers, which is caused by bias that is generated during the training phase. Previous studies have shown that bias in VQA is mainly caused by the text modality, and our analysis suggests that question type is a crucial factor in bias formation. To address this bias, we proposed a self-supervised method including the Against Biased Samples(ABS) module that performs targeted debiasing by selecting samples that are prone to bias, and the Shuffle Question types(SQT) module that constructs negative samples by randomly replacing the question types of the samples selected by the ABS, to interrupting the shortcuts from question type to answer. Our approach mitigates the question-to-answer bias without using external annotations, overcoming the prior language problem. Additionally, we designed a new objective function for negative samples. Experimental results indicate that our method outperforms both self-supervised-based and supervised-based state-of-the-art approaches, achieving 70.36% accuracy on the VQA-CP v2 dataset.
AB - Visual Question Answering (VQA) aims to answer questions according to the given image. However, current VQA models tend to rely solely on textual information from the questions and ignore the visual information in the images to get answers, which is caused by bias that is generated during the training phase. Previous studies have shown that bias in VQA is mainly caused by the text modality, and our analysis suggests that question type is a crucial factor in bias formation. To address this bias, we proposed a self-supervised method including the Against Biased Samples(ABS) module that performs targeted debiasing by selecting samples that are prone to bias, and the Shuffle Question types(SQT) module that constructs negative samples by randomly replacing the question types of the samples selected by the ABS, to interrupting the shortcuts from question type to answer. Our approach mitigates the question-to-answer bias without using external annotations, overcoming the prior language problem. Additionally, we designed a new objective function for negative samples. Experimental results indicate that our method outperforms both self-supervised-based and supervised-based state-of-the-art approaches, achieving 70.36% accuracy on the VQA-CP v2 dataset.
KW - De-biasing
KW - Self-supervised
KW - Visual Question Answering
UR - https://www.scopus.com/pages/publications/85183456946
U2 - 10.1016/j.patrec.2024.01.009
DO - 10.1016/j.patrec.2024.01.009
M3 - 文章
AN - SCOPUS:85183456946
SN - 0167-8655
VL - 178
SP - 181
EP - 187
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
ER -