跳到主要导航 跳到搜索 跳到主要内容

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

  • Zhen Zhao
  • , Jingqun Tang*
  • , Chunhui Lin
  • , Binghong Wu
  • , Can Huang
  • , Hao Liu
  • , Xin Tan
  • , Zhizhong Zhang
  • , Yuan Xie*
  • *此作品的通讯作者
  • East China Normal University
  • ByteDance Ltd.

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Scene text recognition (STR) in the wild frequently en-counters challenges when coping with domain variations, font diversity, shape deformations, etc. A straightforward solution is performing model fine-tuning tailored to a spe-cific scenario, but it is computationally intensive and re-quires multiple model copies for various scenarios. Re-cent studies indicate that large language models (LLMs) can learn from afew demonstration examples in a training-free manner, termed 'In-Context Learning' (ICL). Never-theless, applying LLMs as a text recognizer is unacceptably resource-consuming. Moreover, our pilot experiments on LLMs show that ICL fails in STR, mainly attributed to the insufficient incorporation of contextual information from di-verse samples in the training stage. To this end, we intro-duce E2 STR, a STR model trained with context-rich scene text sequences, where the sequences are generated via our proposed in-context training strategy. E2 STR demonstrates that a regular-sized model is sufficient to achieve effective ICL capabilities in STR. Extensive experiments show that E2 STR exhibits remarkable training-free adaptation in var-ious scenarios and outperforms even the fine-tuned state-of-the-art approaches on public benchmarks. The code is released at https://github.com/bytedanceIE2STR.

源语言英语
主期刊名Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
出版商IEEE Computer Society
15567-15576
页数10
ISBN(电子版)9798350353006
DOI
出版状态已出版 - 2024
活动2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, 美国
期限: 16 6月 202422 6月 2024

出版系列

姓名Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN(印刷版)1063-6919

会议

会议2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
国家/地区美国
Seattle
时期16/06/2422/06/24

指纹

探究 'Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer' 的科研主题。它们共同构成独一无二的指纹。

引用此