Scene text image super-resolution (STISR) aims at simultaneously increasing the resolution and readability of low-resolution scene text images, thus boosting the performance of the downstream recognition task. Two factors in scene text images, visual structure and semantic information, affect the recognition performance significantly. To mitigate the effects from these factors, this paper proposes a Prior-Enhanced Attention Network (PEAN). Specifically, an attention-based modulation module is leveraged to understand scene text images by neatly perceiving the local and global dependence of images, despite the shape of the text. Meanwhile, a diffusion-based module is developed to enhance the text prior, hence offering better guidance for the SR network to generate SR images with higher semantic accuracy. Additionally, a multi-task learning paradigm is employed to optimize the network, enabling the model to generate legible SR images. As a result, PEAN establishes new SOTA results on the TextZoom benchmark. Experiments are also conducted to analyze the importance of the enhanced text prior as a means of improving the performance of the SR network. Code will be made available at https://github.com/jdfxzzy/PEAN.
翻译:场景文本图像超分辨率(STISR)旨在同时提升低分辨率场景文本图像的分辨率和可读性,从而增强下游识别任务的性能。场景文本图像中的视觉结构与语义信息两个因素显著影响识别效果。为缓解这些因素的影响,本文提出了一种先验增强注意力网络(PEAN)。具体而言,利用基于注意力的调制模块,通过精细感知图像的局部与全局依赖关系(无论文字形状如何)来理解场景文本图像;同时,开发基于扩散的模块以增强文本先验,从而为超分辨率网络生成语义准确性更高的超分辨率图像提供更优引导。此外,采用多任务学习范式优化网络,使模型能够生成清晰可读的超分辨率图像。最终,PEAN在TextZoom基准测试上取得了新的最优结果。实验还分析了增强文本先验作为提升超分辨率网络性能手段的重要性。代码将发布于https://github.com/jdfxzzy/PEAN。