Image super-resolution pursuits reconstructing high-fidelity high-resolution counterpart for low-resolution image. In recent years, diffusion-based models have garnered significant attention due to their capabilities with rich prior knowledge. The success of diffusion models based on general text prompts has validated the effectiveness of textual control in the field of text2image. However, given the severe degradation commonly presented in low-resolution images, coupled with the randomness characteristics of diffusion models, current models struggle to adequately discern semantic and degradation information within severely degraded images. This often leads to obstacles such as semantic loss, visual artifacts, and visual hallucinations, which pose substantial challenges for practical use. To address these challenges, this paper proposes to leverage degradation-aligned language prompt for accurate, fine-grained, and high-fidelity image restoration. Complementary priors including semantic content descriptions and degradation prompts are explored. Specifically, on one hand, image-restoration prompt alignment decoder is proposed to automatically discern the degradation degree of LR images, thereby generating beneficial degradation priors for image restoration. On the other hand, much richly tailored descriptions from pretrained multimodal large language model elicit high-level semantic priors closely aligned with human perception, ensuring fidelity control for image restoration. Comprehensive comparisons with state-of-the-art methods have been done on several popular synthetic and real-world benchmark datasets. The quantitative and qualitative analysis have demonstrated that the proposed method achieves a new state-of-the-art perceptual quality level, especially in real-world cases based on reference-free metrics.
翻译:图像超分辨率旨在为低分辨率图像重建高保真的高分辨率对应图像。近年来,基于扩散的模型因其丰富的先验知识能力而受到广泛关注。基于通用文本提示的扩散模型成功验证了文本控制在文生图领域的有效性。然而,考虑到低分辨率图像普遍存在的严重退化现象,加之扩散模型固有的随机性特征,现有模型难以充分识别严重退化图像中的语义与退化信息。这常导致语义丢失、视觉伪影和视觉幻觉等障碍,对实际应用构成重大挑战。为解决这些问题,本文提出利用退化对齐语言提示实现精准、细粒度且高保真的图像复原。研究探索了包括语义内容描述和退化提示在内的互补先验:一方面,提出图像复原提示对齐解码器,可自动识别低分辨率图像的退化程度,从而生成有利于图像复原的退化先验;另一方面,通过预训练多模态大语言模型生成的精细化描述,可提取与人类感知高度一致的高层语义先验,确保图像复原的保真度控制。在多个主流合成与真实世界基准数据集上进行了与先进方法的全面对比,定量与定性分析表明,所提方法在感知质量层面达到了新的最优水平,特别是在基于无参考指标的真实场景中表现突出。