Diffusion models have recently achieved outstanding results in the field of image super-resolution. These methods typically inject low-resolution (LR) images via ControlNet.In this paper, we first explore the temporal dynamics of information infusion through ControlNet, revealing that the input from LR images predominantly influences the initial stages of the denoising process. Leveraging this insight, we introduce a novel timestep-aware diffusion model that adaptively integrates features from both ControlNet and the pre-trained Stable Diffusion (SD). Our method enhances the transmission of LR information in the early stages of diffusion to guarantee image fidelity and stimulates the generation ability of the SD model itself more in the later stages to enhance the detail of generated images. To train this method, we propose a timestep-aware training strategy that adopts distinct losses at varying timesteps and acts on disparate modules. Experiments on benchmark datasets demonstrate the effectiveness of our method. Code: https://github.com/SleepyLin/TASR
翻译:扩散模型近期在图像超分辨率领域取得了显著成果。现有方法通常通过ControlNet注入低分辨率图像。本文首先探究了ControlNet信息注入的时序动态特性,发现低分辨率图像的输入主要影响去噪过程的初始阶段。基于这一洞察,我们提出了一种新颖的时序感知扩散模型,能够自适应地融合ControlNet与预训练Stable Diffusion模型的特征。该方法在扩散早期阶段增强低分辨率信息的传递以保证图像保真度,并在后期阶段更充分地激发SD模型自身的生成能力以提升生成图像的细节质量。为训练此模型,我们提出了时序感知训练策略,该策略在不同时间步采用作用于不同模块的差异化损失函数。在基准数据集上的实验验证了本方法的有效性。代码:https://github.com/SleepyLin/TASR