Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models

Diffusion-based Generative Models (DGMs) have achieved unparalleled performance in synthesizing high-quality visual content, opening up the opportunity to improve image super-resolution (SR) tasks. Recent solutions for these tasks often train architecture-specific DGMs from scratch, or require iterative fine-tuning and distillation on pre-trained DGMs, both of which take considerable time and hardware investments. More seriously, since the DGMs are established with a discrete pre-defined upsampling scale, they cannot well match the emerging requirements of arbitrary-scale super-resolution (ASSR), where a unified model adapts to arbitrary upsampling scales, instead of preparing a series of distinct models for each case. These limitations beg an intriguing question: can we identify the ASSR capability of existing pre-trained DGMs without the need for distillation or fine-tuning? In this paper, we take a step towards resolving this matter by proposing Diff-SR, a first ASSR attempt based solely on pre-trained DGMs, without additional training efforts. It is motivated by an exciting finding that a simple methodology, which first injects a specific amount of noise into the low-resolution images before invoking a DGM's backward diffusion process, outperforms current leading solutions. The key insight is determining a suitable amount of noise to inject, i.e., small amounts lead to poor low-level fidelity, while over-large amounts degrade the high-level signature. Through a finely-grained theoretical analysis, we propose the Perceptual Recoverable Field (PRF), a metric that achieves the optimal trade-off between these two factors. Extensive experiments verify the effectiveness, flexibility, and adaptability of Diff-SR, demonstrating superior performance to state-of-the-art solutions under diverse ASSR environments.

翻译：基于扩散的生成模型（DGMs）在合成高质量视觉内容方面取得了无与伦比的性能，这为提升图像超分辨率（SR）任务提供了机会。近期针对这些任务的解决方案通常从头训练特定架构的DGM，或需要对预训练DGM进行迭代微调和蒸馏，这两者都需要大量时间和硬件投入。更严重的是，由于DGM基于离散预定义上采样尺度建立，它们无法很好匹配新兴的任意尺度超分辨率（ASSR）需求——即用一个统一模型适应任意上采样尺度，而非为每种情况准备一系列不同模型。这些局限性引发了一个有趣的问题：我们能否在不需蒸馏或微调的情况下，识别现有预训练DGM的ASSR能力？本文通过提出Diff-SR向解决此问题迈进一步，这是首个仅基于预训练DGM且无需额外训练的ASSR尝试。其动机源于一个令人振奋的发现：一种简单方法——先在低分辨率图像中注入特定量噪声，再调用DGM的反向扩散过程——性能优于当前领先解决方案。关键洞察在于确定合适的噪声注入量；即少量噪声导致低级保真度差，而过量噪声则损害高级语义特征。通过精细的理论分析，我们提出感知可恢复场（PRF），一个实现这两个因素最优权衡的度量。大量实验验证了Diff-SR的有效性、灵活性和适应性，展示了其在多种ASSR环境下优于最先进解决方案的性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日