Scene text image super-resolution (STISR) aims at simultaneously increasing the resolution and readability of low-resolution scene text images, thus boosting the performance of the downstream recognition task. Two factors in scene text images, visual structure and semantic information, affect the recognition performance significantly. To mitigate the effects from these factors, this paper proposes a Prior-Enhanced Attention Network (PEAN). Specifically, an attention-based modulation module is leveraged to understand scene text images by neatly perceiving the local and global dependence of images, despite the shape of the text. Meanwhile, a diffusion-based module is developed to enhance the text prior, hence offering better guidance for the SR network to generate SR images with higher semantic accuracy. Additionally, a multi-task learning paradigm is employed to optimize the network, enabling the model to generate legible SR images. As a result, PEAN establishes new SOTA results on the TextZoom benchmark. Experiments are also conducted to analyze the importance of the enhanced text prior as a means of improving the performance of the SR network. Code is available at https://github.com/jdfxzzy/PEAN.
翻译:场景文本图像超分辨率(STISR)旨在同时提升低分辨率场景文本图像的分辨率与可读性,从而增强下游识别任务的性能。场景文本图像中的两个因素——视觉结构与语义信息——对识别性能有显著影响。为缓解这些因素的影响,本文提出了一种先验增强注意力网络(PEAN)。具体而言,我们利用一个基于注意力的调制模块来理解场景文本图像,该模块能够巧妙地感知图像的局部与全局依赖关系,而不受文本形状的影响。同时,我们开发了一个基于扩散的模块来增强文本先验,从而为超分辨率网络提供更好的指导,以生成具有更高语义准确度的超分辨率图像。此外,我们采用多任务学习范式来优化网络,使模型能够生成清晰可读的超分辨率图像。实验结果表明,PEAN在TextZoom基准测试中取得了新的最优性能。我们还通过实验分析了增强的文本先验作为提升超分辨率网络性能手段的重要性。代码可在 https://github.com/jdfxzzy/PEAN 获取。