TY - GEN
T1 - Towards Reasoning Ability in Scene Text Visual Question Answering
AU - Wang, Qingqing
AU - Xiao, Liqiang
AU - Lu, Yue
AU - Jin, Yaohui
AU - He, Hao
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/10/17
Y1 - 2021/10/17
N2 - Works on scene text visual question answering (TextVQA) always emphasize the importance of reasoning questions and image contents. However, we find current TextVQA models lack reasoning ability and tend to answer questions by exploiting dataset bias and language priors. Moreover, our observations indicate that recent accuracy improvement in TextVQA is mainly contributed by stronger OCR engines, better pre-training strategies and more Transformer layers, instead of newly proposed networks. In this work, towards the reasoning ability, we 1) conduct module-wise contribution analysis to quantitatively investigate how existing works improve accuracies in TextVQA; 2) design a gradient-based explainability method to explore why TextVQA models answer what they answer and find evidence for their predictions; 3) perform qualitative experiments to visually analyze models reasoning ability and explore potential reasons behind such a poor ability.
AB - Works on scene text visual question answering (TextVQA) always emphasize the importance of reasoning questions and image contents. However, we find current TextVQA models lack reasoning ability and tend to answer questions by exploiting dataset bias and language priors. Moreover, our observations indicate that recent accuracy improvement in TextVQA is mainly contributed by stronger OCR engines, better pre-training strategies and more Transformer layers, instead of newly proposed networks. In this work, towards the reasoning ability, we 1) conduct module-wise contribution analysis to quantitatively investigate how existing works improve accuracies in TextVQA; 2) design a gradient-based explainability method to explore why TextVQA models answer what they answer and find evidence for their predictions; 3) perform qualitative experiments to visually analyze models reasoning ability and explore potential reasons behind such a poor ability.
KW - TextVQA
KW - explainability method
KW - quantitatively and qualitative analysis
KW - reasoning ability
UR - https://www.scopus.com/pages/publications/85119375848
U2 - 10.1145/3474085.3475390
DO - 10.1145/3474085.3475390
M3 - 会议稿件
AN - SCOPUS:85119375848
T3 - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
SP - 2281
EP - 2289
BT - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
T2 - 29th ACM International Conference on Multimedia, MM 2021
Y2 - 20 October 2021 through 24 October 2021
ER -