Scene text image super-resolution (STISR) aims to simultaneously increase the resolution and legibility of the text images, and the resulting images will significantly affect the performance of downstream tasks. Although numerous progress has been made, existing approaches raise two crucial issues: (1) They neglect the global structure of the text, which bounds the semantic determinism of the scene text. (2) The priors, e.g., text prior or stroke prior, employed in existing works, are extracted from pre-trained text recognizers. That said, such priors suffer from the domain gap including low resolution and blurriness caused by poor imaging conditions, leading to incorrect guidance. Our work addresses these gaps and proposes a plug-and-play module dubbed Dual Prior Modulation Network (DPMN), which leverages dual image-level priors to bring performance gain over existing approaches. Specifically, two types of prior-guided refinement modules, each using the text mask or graphic recognition result of the low-quality SR image from the preceding layer, are designed to improve the structural clarity and semantic accuracy of the text, respectively. The following attention mechanism hence modulates two quality-enhanced images to attain a superior SR result. Extensive experiments validate that our method improves the image quality and boosts the performance of downstream tasks over five typical approaches on the benchmark. Substantial visualizations and ablation studies demonstrate the advantages of the proposed DPMN. Code is available at: https://github.com/jdfxzzy/DPMN.
翻译:场景文本图像超分辨率旨在同时提升文本图像的分辨率和可读性,其生成图像将显著影响下游任务的性能。尽管已取得诸多进展,现有方法仍存在两个关键问题:(1)忽视了文本的全局结构,限制了场景文本的语义确定性;(2)现有工作采用的先验信息(如文本先验或笔画先验)均来自预训练的文本识别器。然而,此类先验因低分辨率、模糊等不良成像条件导致的域差异而提供错误指导。本研究针对上述问题提出即插即用型模块——双先验调制网络,通过利用双重图像级先验在现有方法基础上提升性能。具体而言,设计两类先验引导精炼模块,分别利用前层低质量超分辨率图像的文本掩码或图形识别结果,提升文本的结构清晰度和语义准确性。后续注意力机制对两幅质量增强图像进行调制,获得更优的超分辨率结果。大量实验表明,本方法在基准测试中相较于五种典型方法,显著改善了图像质量并提升了下游任务性能。充分的视觉化结果与消融研究证明了所提DPMN的优势。代码开源地址:https://github.com/jdfxzzy/DPMN。