The diversity in length constitutes a significant characteristic of text. Due to the long-tail distribution of text lengths, most existing methods for scene text recognition (STR) only work well on short or seen-length text, lacking the capability of recognizing longer text or performing length extrapolation. This is a crucial issue, since the lengths of the text to be recognized are usually not given in advance in real-world applications, but it has not been adequately investigated in previous works. Therefore, we propose in this paper a method called Length-Insensitive Scene TExt Recognizer (LISTER), which remedies the limitation regarding the robustness to various text lengths. Specifically, a Neighbor Decoder is proposed to obtain accurate character attention maps with the assistance of a novel neighbor matrix regardless of the text lengths. Besides, a Feature Enhancement Module is devised to model the long-range dependency with low computation cost, which is able to perform iterations with the neighbor decoder to enhance the feature map progressively. To the best of our knowledge, we are the first to achieve effective length-insensitive scene text recognition. Extensive experiments demonstrate that the proposed LISTER algorithm exhibits obvious superiority on long text recognition and the ability for length extrapolation, while comparing favourably with the previous state-of-the-art methods on standard benchmarks for STR (mainly short text).
翻译:文本长度的多样性是文字的重要特征。由于文本长度的长尾分布,现有场景文本识别(STR)方法大多仅能良好处理短文本或训练中见过的长度文本,缺乏识别更长文本或进行长度外推的能力。这是一个关键问题,因为在真实应用中待识别文本的长度通常无法预先获知,但此前研究尚未对此进行充分探讨。为此,本文提出一种名为长度不敏感场景文本识别器(LISTER)的方法,弥补了模型对多长度文本鲁棒性的不足。具体而言,我们提出邻居解码器,通过创新性的邻居矩阵辅助,无需考虑文本长度即可获取精准的字符注意力图。此外,设计了特征增强模块,以低计算成本建模长程依赖关系,该模块可与邻居解码器迭代配合,渐进式增强特征图。据我们所知,这是首次实现有效的长度不敏感场景文本识别。大量实验表明,所提出的LISTER算法在长文本识别及长度外推能力上具有显著优势,同时在标准STR基准(主要为短文本)上达到甚至超越了以往最先进方法。