TY - GEN
T1 - Enhancing Out-of-Distribution Generalization in VQA through Gini Impurity-guided Adaptive Margin Loss
AU - Yang, Shuwen
AU - Huai, Tianyu
AU - Wu, Anran
AU - Wu, Xingjiao
AU - Hu, Wenxin
AU - He, Liang
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In the Visual Question Answering (VQA) task context, most methods are influenced by language bias, resulting in poor performance on out-of-distribution data. Recently, some works attempted to use the adaptive margin loss to address this bias issue. However, these works typically consider only the frequency of answer labels when designing margin loss, leading to some samples being overly emphasized or lacking sufficient attention during model training. To address this issue, we propose a novel margin loss guided by the Gini-impurity for VQA debiasing. By comprehensively considering label distribution and instance complexity, we use Gini impurity to adjust the margin values in margin loss, balancing the attention of the model to different samples. Importantly, our method is plug-and-play and can be directly applied to any baseline. In the VQA-CP v2 task, our evaluation results across various baselines surpass the current state-of-the-art methods.
AB - In the Visual Question Answering (VQA) task context, most methods are influenced by language bias, resulting in poor performance on out-of-distribution data. Recently, some works attempted to use the adaptive margin loss to address this bias issue. However, these works typically consider only the frequency of answer labels when designing margin loss, leading to some samples being overly emphasized or lacking sufficient attention during model training. To address this issue, we propose a novel margin loss guided by the Gini-impurity for VQA debiasing. By comprehensively considering label distribution and instance complexity, we use Gini impurity to adjust the margin values in margin loss, balancing the attention of the model to different samples. Importantly, our method is plug-and-play and can be directly applied to any baseline. In the VQA-CP v2 task, our evaluation results across various baselines surpass the current state-of-the-art methods.
KW - Language bias
KW - Margin loss
KW - Visual question answering
UR - https://www.scopus.com/pages/publications/85206564829
U2 - 10.1109/ICME57554.2024.10688270
DO - 10.1109/ICME57554.2024.10688270
M3 - 会议稿件
AN - SCOPUS:85206564829
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2024 IEEE International Conference on Multimedia and Expo, ICME 2024
PB - IEEE Computer Society
T2 - 2024 IEEE International Conference on Multimedia and Expo, ICME 2024
Y2 - 15 July 2024 through 19 July 2024
ER -