跳到主要导航 跳到搜索 跳到主要内容

Free Lunch: Frame-level Contrastive Learning with Text Perceiver for Robust Scene Text Recognition in Lightweight Models

  • Hongjian Zhan
  • , Yangfu Li
  • , Yu Jie Xiong
  • , Umapada Pal
  • , Yue Lu*
  • *此作品的通讯作者
  • East China Normal University
  • Shanghai University of Engineering Science
  • Indian Statistical Institute

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Lightweight models play an important role in real-life applications, especially in the recent mobile device era. However, due to limited network scale and low-quality images, the performance of lightweight models on Scene Text Recognition (STR) tasks is still much to be improved. Recently, contrastive learning has shown its power in many areas, with promising performances without additional computational cost. Based on these observations, we propose a new efficient and effective frame-level contrastive learning (FLCL) framework for lightweight STR models. The FLCL framework consists of a backbone to extract basic features, a Text Perceiver Module (TPM) to focus on text-relevant representations, and a FLCL loss to update the network. The backbone can be any feature extraction architecture. The TPM is an innovative Mamba-based structure that is designed to suppress features irrelevant to the text content from the backbone. Unlike existing word-level contrastive learning, we look into the nature of the STR task and propose the frame-level contrastive learning loss, which can work well with the famous Connectionist Temporal Classification loss. We conduct experiments on six well-known STR benchmarks as well as a new low-quality dataset. Compared to vanilla contrastive learning and other non-parameter methods, the FLCL framework significantly outperforms others on all datasets, especially the low-quality dataset. In addition, character feature visualization demonstrates that the proposed method can yield more discriminative character features for visually similar characters, which also substantiates the efficacy of the proposed methods. Codes and the low-quality dataset will be available soon.

源语言英语
主期刊名MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
出版商Association for Computing Machinery, Inc
6202-6211
页数10
ISBN(电子版)9798400706868
DOI
出版状态已出版 - 28 10月 2024
活动32nd ACM International Conference on Multimedia, MM 2024 - Melbourne, 澳大利亚
期限: 28 10月 20241 11月 2024

出版系列

姓名MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

会议

会议32nd ACM International Conference on Multimedia, MM 2024
国家/地区澳大利亚
Melbourne
时期28/10/241/11/24

指纹

探究 'Free Lunch: Frame-level Contrastive Learning with Text Perceiver for Robust Scene Text Recognition in Lightweight Models' 的科研主题。它们共同构成独一无二的指纹。

引用此