Scene Text Image Super-resolution (STISR) aims to recover high-resolution (HR) scene text images with visually pleasant and readable text content from the given low-resolution (LR) input. Most existing works focus on recovering English texts, which have relatively simple character structures, while little work has been done on the more challenging Chinese texts with diverse and complex character structures. In this paper, we propose a real-world Chinese-English benchmark dataset, namely Real-CE, for the task of STISR with the emphasis on restoring structurally complex Chinese characters. The benchmark provides 1,935/783 real-world LR-HR text image pairs~(contains 33,789 text lines in total) for training/testing in 2$\times$ and 4$\times$ zooming modes, complemented by detailed annotations, including detection boxes and text transcripts. Moreover, we design an edge-aware learning method, which provides structural supervision in image and feature domains, to effectively reconstruct the dense structures of Chinese characters. We conduct experiments on the proposed Real-CE benchmark and evaluate the existing STISR models with and without our edge-aware loss. The benchmark, including data and source code, is available at https://github.com/mjq11302010044/Real-CE.
翻译:场景文字图像超分辨率(STISR)旨在从给定的低分辨率(LR)输入中恢复高分辨率(HR)的场景文字图像,使其具有视觉愉悦且可读的文本内容。现有工作大多聚焦于恢复结构相对简单的英文文本,而针对字形多样、结构复杂、更具挑战性的中文文本的研究尚显不足。本文针对强调结构复杂中文字符恢复的STISR任务,提出了一个真实场景下的中英文字图像超分辨率基准数据集Real-CE。该数据集提供了1,935/783对真实场景下低-高分辨率文字图像对(总计33,789个文本行),用于2倍和4倍缩放模式下的训练/测试,并辅以详细的标注信息,包括检测框和文本转录。此外,我们设计了一种边缘感知学习方法,在图像域和特征域提供结构监督,以有效重建中文字符的密集结构。我们在所提出的Real-CE基准数据集上开展实验,并评估了现有STISR模型在使用和不使用我们提出的边缘感知损失函数时的性能表现。该基准数据集(包括数据和源代码)已开源至https://github.com/mjq11302010044/Real-CE。