Owe to the powerful generative priors, the pre-trained text-to-image (T2I) diffusion models have become increasingly popular in solving the real-world image super-resolution problem. However, as a consequence of the heavy quality degradation of input low-resolution (LR) images, the destruction of local structures can lead to ambiguous image semantics. As a result, the content of reproduced high-resolution image may have semantic errors, deteriorating the super-resolution performance. To address this issue, we present a semantics-aware approach to better preserve the semantic fidelity of generative real-world image super-resolution. First, we train a degradation-aware prompt extractor, which can generate accurate soft and hard semantic prompts even under strong degradation. The hard semantic prompts refer to the image tags, aiming to enhance the local perception ability of the T2I model, while the soft semantic prompts compensate for the hard ones to provide additional representation information. These semantic prompts encourage the T2I model to generate detailed and semantically accurate results. Furthermore, during the inference process, we integrate the LR images into the initial sampling noise to mitigate the diffusion model's tendency to generate excessive random details. The experiments show that our method can reproduce more realistic image details and hold better the semantics. The source code of our method can be found at https://github.com/cswry/SeeSR.
翻译:得益于强大的生成先验,预训练的文本到图像(T2I)扩散模型在解决真实世界图像超分辨率问题上日益流行。然而,由于输入低分辨率(LR)图像存在严重的质量退化,局部结构的破坏可能导致图像语义模糊。因此,重建的高分辨率图像内容可能出现语义错误,从而降低超分辨率性能。为解决此问题,我们提出一种语义感知方法,以更好地保持生成式真实世界图像超分辨率的语义保真度。首先,我们训练一个退化感知提示提取器,即使在强退化条件下也能生成准确的软语义提示和硬语义提示。硬语义提示指图像标签,旨在增强T2I模型的局部感知能力,而软语义提示则对硬提示进行补充,以提供额外的表征信息。这些语义提示促使T2I模型生成细节丰富且语义准确的结果。此外,在推理过程中,我们将LR图像整合到初始采样噪声中,以减轻扩散模型生成过多随机细节的倾向。实验表明,我们的方法能够重建更真实的图像细节并更好地保持语义。本方法的源代码可在 https://github.com/cswry/SeeSR 找到。