SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution

Diffusion-based super-resolution (SR) models have recently garnered significant attention due to their potent restoration capabilities. But conventional diffusion models perform noise sampling from a single distribution, constraining their ability to handle real-world scenes and complex textures across semantic regions. With the success of segment anything model (SAM), generating sufficiently fine-grained region masks can enhance the detail recovery of diffusion-based SR model. However, directly integrating SAM into SR models will result in much higher computational cost. In this paper, we propose the SAM-DiffSR model, which can utilize the fine-grained structure information from SAM in the process of sampling noise to improve the image quality without additional computational cost during inference. In the process of training, we encode structural position information into the segmentation mask from SAM. Then the encoded mask is integrated into the forward diffusion process by modulating it to the sampled noise. This adjustment allows us to independently adapt the noise mean within each corresponding segmentation area. The diffusion model is trained to estimate this modulated noise. Crucially, our proposed framework does NOT change the reverse diffusion process and does NOT require SAM at inference. Experimental results demonstrate the effectiveness of our proposed method, showcasing superior performance in suppressing artifacts, and surpassing existing diffusion-based methods by 0.74 dB at the maximum in terms of PSNR on DIV2K dataset. The code and dataset are available at https://github.com/lose4578/SAM-DiffSR.

翻译：基于扩散的超分辨率模型凭借其强大的复原能力，近期获得了广泛关注。然而，传统的扩散模型从单一分布进行噪声采样，限制了其处理真实场景和跨语义区域复杂纹理的能力。随着"分割一切"模型（SAM）的成功，生成足够精细的区域掩码可以增强基于扩散的超分辨率模型的细节恢复能力。然而，直接将SAM集成到超分辨率模型中会导致计算成本大幅增加。本文提出SAM-DiffSR模型，该模型能够在噪声采样过程中利用SAM提供的细粒度结构信息来提升图像质量，且在推理阶段无需额外计算开销。在训练过程中，我们将结构位置信息编码到来自SAM的分割掩码中。随后，通过将编码后的掩码调制到采样的噪声上，将其整合到前向扩散过程中。这一调整使我们能够独立地适应每个对应分割区域内的噪声均值。扩散模型被训练用于估计这种调制后的噪声。关键在于，我们提出的框架并未改变反向扩散过程，且在推理时无需使用SAM。实验结果证明了我们提出方法的有效性，在抑制伪影方面展现出优越性能，在DIV2K数据集上的PSNR指标最高超过现有基于扩散的方法0.74 dB。代码和数据集可在https://github.com/lose4578/SAM-DiffSR获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日