跳到主要导航 跳到搜索 跳到主要内容

Towards Reasoning Ability in Scene Text Visual Question Answering

  • Qingqing Wang
  • , Liqiang Xiao
  • , Yue Lu
  • , Yaohui Jin*
  • , Hao He
  • *此作品的通讯作者
  • Shanghai Jiao Tong University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Works on scene text visual question answering (TextVQA) always emphasize the importance of reasoning questions and image contents. However, we find current TextVQA models lack reasoning ability and tend to answer questions by exploiting dataset bias and language priors. Moreover, our observations indicate that recent accuracy improvement in TextVQA is mainly contributed by stronger OCR engines, better pre-training strategies and more Transformer layers, instead of newly proposed networks. In this work, towards the reasoning ability, we 1) conduct module-wise contribution analysis to quantitatively investigate how existing works improve accuracies in TextVQA; 2) design a gradient-based explainability method to explore why TextVQA models answer what they answer and find evidence for their predictions; 3) perform qualitative experiments to visually analyze models reasoning ability and explore potential reasons behind such a poor ability.

源语言英语
主期刊名MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
出版商Association for Computing Machinery, Inc
2281-2289
页数9
ISBN(电子版)9781450386517
DOI
出版状态已出版 - 17 10月 2021
活动29th ACM International Conference on Multimedia, MM 2021 - Virtual, Online, 中国
期限: 20 10月 202124 10月 2021

出版系列

姓名MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

会议

会议29th ACM International Conference on Multimedia, MM 2021
国家/地区中国
Virtual, Online
时期20/10/2124/10/21

指纹

探究 'Towards Reasoning Ability in Scene Text Visual Question Answering' 的科研主题。它们共同构成独一无二的指纹。

引用此