Image super-resolution (SR) methods typically model degradation to improve reconstruction accuracy in complex and unknown degradation scenarios. However, extracting degradation information from low-resolution images is challenging, which limits the model performance. To boost image SR performance, one feasible approach is to introduce additional priors. Inspired by advancements in multi-modal methods and text prompt image processing, we introduce text prompts to image SR to provide degradation priors. Specifically, we first design a text-image generation pipeline to integrate text into the SR dataset through the text degradation representation and degradation model. The text representation applies a discretization manner based on the binning method to describe the degradation abstractly. This method maintains the flexibility of the text and is user-friendly. Meanwhile, we propose the PromptSR to realize the text prompt SR. The PromptSR utilizes the pre-trained language model (e.g., T5 or CLIP) to enhance restoration. We train the model on the generated text-image dataset. Extensive experiments indicate that introducing text prompts into SR, yields excellent results on both synthetic and real-world images. Code is available at: https://github.com/zhengchen1999/PromptSR.
翻译:图像超分辨率(SR)方法通常通过对退化过程建模来提高复杂未知退化场景下的重建精度。然而,从低分辨率图像中提取退化信息具有挑战性,这限制了模型性能。为提升图像SR性能,引入额外先验是一种可行方案。受多模态方法及文本提示图像处理进展的启发,我们将文本提示引入图像SR以提供退化先验。具体而言,我们首先设计了一个文本-图像生成流水线,通过文本退化表示和退化模型将文本集成到SR数据集中。该文本表示采用基于分箱法的离散化方式对退化特征进行抽象描述,既保持了文本的灵活性,又具备用户友好性。同时,我们提出PromptSR模型实现文本提示引导的超分辨率。该模型利用预训练语言模型(如T5或CLIP)增强图像复原能力,并在生成的文本-图像数据集上进行训练。大量实验表明,将文本提示引入SR在合成图像和真实世界图像上均取得了优异效果。代码开源地址:https://github.com/zhengchen1999/PromptSR。