PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution

Scene text image super-resolution (STISR) aims at simultaneously increasing the resolution and readability of low-resolution scene text images, thus boosting the performance of the downstream recognition task. Two factors in scene text images, visual structure and semantic information, affect the recognition performance significantly. To mitigate the effects from these factors, this paper proposes a Prior-Enhanced Attention Network (PEAN). Specifically, an attention-based modulation module is leveraged to understand scene text images by neatly perceiving the local and global dependence of images, despite the shape of the text. Meanwhile, a diffusion-based module is developed to enhance the text prior, hence offering better guidance for the SR network to generate SR images with higher semantic accuracy. Additionally, a multi-task learning paradigm is employed to optimize the network, enabling the model to generate legible SR images. As a result, PEAN establishes new SOTA results on the TextZoom benchmark. Experiments are also conducted to analyze the importance of the enhanced text prior as a means of improving the performance of the SR network. Code will be made available at https://github.com/jdfxzzy/PEAN.

翻译：场景文本图像超分辨率（STISR）旨在同时提升低分辨率场景文本图像的分辨率和可读性，从而增强下游识别任务的性能。场景文本图像中的视觉结构与语义信息两个因素显著影响识别效果。为缓解这些因素的影响，本文提出了一种先验增强注意力网络（PEAN）。具体而言，利用基于注意力的调制模块，通过精细感知图像的局部与全局依赖关系（无论文字形状如何）来理解场景文本图像；同时，开发基于扩散的模块以增强文本先验，从而为超分辨率网络生成语义准确性更高的超分辨率图像提供更优引导。此外，采用多任务学习范式优化网络，使模型能够生成清晰可读的超分辨率图像。最终，PEAN在TextZoom基准测试上取得了新的最优结果。实验还分析了增强文本先验作为提升超分辨率网络性能手段的重要性。代码将发布于https://github.com/jdfxzzy/PEAN。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日