摘要
LSTM and attention mechanism have been widely used for scene text recognition. However, the existing LSTM-based recognizers usually convert 2D feature maps into 1D space by flattening or pooling operations, resulting in the neglect of spatial information of text images. Additionally, the attention drift problem, where models fail to align targets at proper feature regions, has a serious impact on the recognition performance of existing models. To tackle the above problems, in this paper, we propose a scene text Recognizer with Encoded Location and Focused Attention, i.e., ReELFA. Our ReELFA utilizes one-hot encoded coordinates to indicate the spatial relationship of pixels and character center masks to help focus attention on the right feature areas. Experiments conducted on the benchmarking datasets IIIT5K, SVT, CUTE and IC15 demonstrate that the proposed method achieves comparable performance on the regular, low-resolution and noisy text images, and outperforms state-of-the-art approaches on the more challenging curved text images.
| 源语言 | 英语 |
|---|---|
| 页 | 71-76 |
| 页数 | 6 |
| DOI | |
| 出版状态 | 已出版 - 2019 |
| 活动 | 2nd International Workshop on Machine Learning, WML 2019 - ICDAR 2019 Workshop - Sydney, 澳大利亚 期限: 21 9月 2019 → 22 9月 2019 |
会议
| 会议 | 2nd International Workshop on Machine Learning, WML 2019 - ICDAR 2019 Workshop |
|---|---|
| 国家/地区 | 澳大利亚 |
| 市 | Sydney |
| 时期 | 21/09/19 → 22/09/19 |
指纹
探究 'ReELFA: A scene text recognizer with encoded location and focused attention' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver