Diffusion-based super-resolution (SR) models have recently garnered significant attention due to their potent restoration capabilities. But conventional diffusion models perform noise sampling from a single distribution, constraining their ability to handle real-world scenes and complex textures across semantic regions. With the success of segment anything model (SAM), generating sufficiently fine-grained region masks can enhance the detail recovery of diffusion-based SR model. However, directly integrating SAM into SR models will result in much higher computational cost. In this paper, we propose the SAM-DiffSR model, which can utilize the fine-grained structure information from SAM in the process of sampling noise to improve the image quality without additional computational cost during inference. In the process of training, we encode structural position information into the segmentation mask from SAM. Then the encoded mask is integrated into the forward diffusion process by modulating it to the sampled noise. This adjustment allows us to independently adapt the noise mean within each corresponding segmentation area. The diffusion model is trained to estimate this modulated noise. Crucially, our proposed framework does NOT change the reverse diffusion process and does NOT require SAM at inference. Experimental results demonstrate the effectiveness of our proposed method, showcasing superior performance in suppressing artifacts, and surpassing existing diffusion-based methods by 0.74 dB at the maximum in terms of PSNR on DIV2K dataset. The code and dataset are available at https://github.com/lose4578/SAM-DiffSR.
翻译:基于扩散的超分辨率模型凭借其强大的复原能力,近期获得了广泛关注。然而,传统的扩散模型从单一分布进行噪声采样,限制了其处理真实场景和跨语义区域复杂纹理的能力。随着"分割一切"模型(SAM)的成功,生成足够精细的区域掩码可以增强基于扩散的超分辨率模型的细节恢复能力。然而,直接将SAM集成到超分辨率模型中会导致计算成本大幅增加。本文提出SAM-DiffSR模型,该模型能够在噪声采样过程中利用SAM提供的细粒度结构信息来提升图像质量,且在推理阶段无需额外计算开销。在训练过程中,我们将结构位置信息编码到来自SAM的分割掩码中。随后,通过将编码后的掩码调制到采样的噪声上,将其整合到前向扩散过程中。这一调整使我们能够独立地适应每个对应分割区域内的噪声均值。扩散模型被训练用于估计这种调制后的噪声。关键在于,我们提出的框架并未改变反向扩散过程,且在推理时无需使用SAM。实验结果证明了我们提出方法的有效性,在抑制伪影方面展现出优越性能,在DIV2K数据集上的PSNR指标最高超过现有基于扩散的方法0.74 dB。代码和数据集可在https://github.com/lose4578/SAM-DiffSR获取。