Scene text recognition (STR) methods have struggled to attain high accuracy and fast inference speed. Autoregressive (AR)-based models implement the recognition in a character-by-character manner, showing superiority in accuracy but with slow inference speed. Alternatively, parallel decoding (PD)-based models infer all characters in a single decoding pass, offering faster inference speed but generally worse accuracy. We first present an empirical study of AR decoding in STR, and discover that the AR decoder not only models linguistic context, but also provides guidance on visual context perception. Consequently, we propose Context Perception Parallel Decoder (CPPD) to predict the character sequence in a PD pass. CPPD devises a character counting module to infer the occurrence count of each character, and a character ordering module to deduce the content-free reading order and placeholders. Meanwhile, the character prediction task associates the placeholders with characters. They together build a comprehensive recognition context. We construct a series of CPPD models and also plug the proposed modules into existing STR decoders. Experiments on both English and Chinese benchmarks demonstrate that the CPPD models achieve highly competitive accuracy while running approximately 8x faster than their AR-based counterparts. Moreover, the plugged models achieve significant accuracy improvements. Code is at \href{https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/algorithm_rec_cppd_en.md}{this https URL}.
翻译:场景文本识别(STR)方法长期以来难以兼顾高精度和快速推理速度。基于自回归(AR)的模型以逐字符方式实现识别,在精度上具有优势,但推理速度较慢。而基于并行解码(PD)的模型通过单次解码过程推断所有字符,推理速度更快,但精度通常较差。我们首先对STR中的自回归解码进行了实证研究,发现自回归解码器不仅建模语言上下文,还为视觉上下文感知提供指导。为此,我们提出上下文感知并行解码器(CPPD),以单次并行解码方式预测字符序列。CPPD设计了字符计数模块来推断每个字符的出现次数,以及字符排序模块来推导无内容阅读顺序和占位符。同时,字符预测任务将占位符与字符关联起来,共同构建全面的识别上下文。我们构建了一系列CPPD模型,并将所提模块嵌入到现有STR解码器中。在英文和中文基准测试上的实验表明,CPPD模型在实现极具竞争力的精度的同时,推理速度比基于自回归的对应模型快约8倍。此外,嵌入模块后模型精度显著提升。代码地址为:\href{https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/algorithm_rec_cppd_en.md}{此HTTPS链接}。