Diffusion-based Generative Models (DGMs) have achieved unparalleled performance in synthesizing high-quality visual content, opening up the opportunity to improve image super-resolution (SR) tasks. Recent solutions for these tasks often train architecture-specific DGMs from scratch, or require iterative fine-tuning and distillation on pre-trained DGMs, both of which take considerable time and hardware investments. More seriously, since the DGMs are established with a discrete pre-defined upsampling scale, they cannot well match the emerging requirements of arbitrary-scale super-resolution (ASSR), where a unified model adapts to arbitrary upsampling scales, instead of preparing a series of distinct models for each case. These limitations beg an intriguing question: can we identify the ASSR capability of existing pre-trained DGMs without the need for distillation or fine-tuning? In this paper, we take a step towards resolving this matter by proposing Diff-SR, a first ASSR attempt based solely on pre-trained DGMs, without additional training efforts. It is motivated by an exciting finding that a simple methodology, which first injects a specific amount of noise into the low-resolution images before invoking a DGM's backward diffusion process, outperforms current leading solutions. The key insight is determining a suitable amount of noise to inject, i.e., small amounts lead to poor low-level fidelity, while over-large amounts degrade the high-level signature. Through a finely-grained theoretical analysis, we propose the Perceptual Recoverable Field (PRF), a metric that achieves the optimal trade-off between these two factors. Extensive experiments verify the effectiveness, flexibility, and adaptability of Diff-SR, demonstrating superior performance to state-of-the-art solutions under diverse ASSR environments.
翻译:基于扩散的生成模型(DGMs)在合成高质量视觉内容方面取得了无与伦比的性能,这为提升图像超分辨率(SR)任务提供了机会。近期针对这些任务的解决方案通常从头训练特定架构的DGM,或需要对预训练DGM进行迭代微调和蒸馏,这两者都需要大量时间和硬件投入。更严重的是,由于DGM基于离散预定义上采样尺度建立,它们无法很好匹配新兴的任意尺度超分辨率(ASSR)需求——即用一个统一模型适应任意上采样尺度,而非为每种情况准备一系列不同模型。这些局限性引发了一个有趣的问题:我们能否在不需蒸馏或微调的情况下,识别现有预训练DGM的ASSR能力?本文通过提出Diff-SR向解决此问题迈进一步,这是首个仅基于预训练DGM且无需额外训练的ASSR尝试。其动机源于一个令人振奋的发现:一种简单方法——先在低分辨率图像中注入特定量噪声,再调用DGM的反向扩散过程——性能优于当前领先解决方案。关键洞察在于确定合适的噪声注入量;即少量噪声导致低级保真度差,而过量噪声则损害高级语义特征。通过精细的理论分析,我们提出感知可恢复场(PRF),一个实现这两个因素最优权衡的度量。大量实验验证了Diff-SR的有效性、灵活性和适应性,展示了其在多种ASSR环境下优于最先进解决方案的性能。