PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution

Scene text image super-resolution (STISR) aims at simultaneously increasing the resolution and readability of low-resolution scene text images, thus boosting the performance of the downstream recognition task. Two factors in scene text images, visual structure and semantic information, affect the recognition performance significantly. To mitigate the effects from these factors, this paper proposes a Prior-Enhanced Attention Network (PEAN). Specifically, an attention-based modulation module is leveraged to understand scene text images by neatly perceiving the local and global dependence of images, despite the shape of the text. Meanwhile, a diffusion-based module is developed to enhance the text prior, hence offering better guidance for the SR network to generate SR images with higher semantic accuracy. Additionally, a multi-task learning paradigm is employed to optimize the network, enabling the model to generate legible SR images. As a result, PEAN establishes new SOTA results on the TextZoom benchmark. Experiments are also conducted to analyze the importance of the enhanced text prior as a means of improving the performance of the SR network. Code is available at https://github.com/jdfxzzy/PEAN.

翻译：场景文本图像超分辨率（STISR）旨在同时提升低分辨率场景文本图像的分辨率与可读性，从而增强下游识别任务的性能。场景文本图像中的两个因素——视觉结构与语义信息——对识别性能有显著影响。为缓解这些因素的影响，本文提出了一种先验增强注意力网络（PEAN）。具体而言，我们利用一个基于注意力的调制模块来理解场景文本图像，该模块能够巧妙地感知图像的局部与全局依赖关系，而不受文本形状的影响。同时，我们开发了一个基于扩散的模块来增强文本先验，从而为超分辨率网络提供更好的指导，以生成具有更高语义准确度的超分辨率图像。此外，我们采用多任务学习范式来优化网络，使模型能够生成清晰可读的超分辨率图像。实验结果表明，PEAN在TextZoom基准测试中取得了新的最优性能。我们还通过实验分析了增强的文本先验作为提升超分辨率网络性能手段的重要性。代码可在 https://github.com/jdfxzzy/PEAN 获取。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日