Text irregularities pose significant challenges to scene text recognizers. Thin-Plate Spline (TPS)-based rectification is widely regarded as an effective means to deal with them. Currently, the calculation of TPS transformation parameters purely depends on the quality of regressed text borders. It ignores the text content and often leads to unsatisfactory rectified results for severely distorted text. In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time. TPS++ formulates the parameter calculation as a joint process of foreground control point regression and content-based attention score estimation, which is computed by a dedicated designed gated-attention block. TPS++ builds a more flexible content-aware rectifier, generating a natural text correction that is easier to read by the subsequent recognizer. Moreover, TPS++ shares the feature backbone with the recognizer in part and implements the rectification at feature-level rather than image-level, incurring only a small overhead in terms of parameters and inference time. Experiments on public benchmarks show that TPS++ consistently improves the recognition and achieves state-of-the-art accuracy. Meanwhile, it generalizes well on different backbones and recognizers. Code is at https://github.com/simplify23/TPS_PP.
翻译:文本的不规则性为场景文本识别带来了重大挑战。基于薄板样条(TPS)的校正被广泛视为应对此类问题的有效手段。目前,TPS变换参数的计算纯粹依赖于回归文本边界的质量,忽略了文本内容,往往导致严重扭曲文本的校正结果不佳。本文提出了TPS++,一种注意力增强的TPS变换方法,首次将注意力机制引入文本校正。TPS++将参数计算定义为前景控制点回归与基于内容的注意力得分估计的联合过程,其中注意力得分由专门设计的门控注意力模块计算。TPS++构建了更灵活的内容感知校正器,生成更自然的文本校正结果,便于后续识别器读取。此外,TPS++与识别器共享部分特征骨干网络,并在特征层面而非图像层面实现校正,仅增加少量参数和推理时间开销。在公开基准上的实验表明,TPS++持续提升识别性能并达到最先进精度,同时在不同骨干网络和识别器上均具有良好的泛化性。代码地址:https://github.com/simplify23/TPS_PP。