The performance of single image super-resolution depends heavily on how to generate and complement high-frequency details to low-resolution images. Recently, diffusion-based models exhibit great potential in generating high-quality images for super-resolution tasks. However, existing models encounter difficulties in directly predicting high-frequency information of wide bandwidth by solely utilizing the high-resolution ground truth as the target for all sampling timesteps. To tackle this problem and achieve higher-quality super-resolution, we propose a novel Frequency Domain-guided multiscale Diffusion model (FDDiff), which decomposes the high-frequency information complementing process into finer-grained steps. In particular, a wavelet packet-based frequency complement chain is developed to provide multiscale intermediate targets with increasing bandwidth for reverse diffusion process. Then FDDiff guides reverse diffusion process to progressively complement the missing high-frequency details over timesteps. Moreover, we design a multiscale frequency refinement network to predict the required high-frequency components at multiple scales within one unified network. Comprehensive evaluations on popular benchmarks are conducted, and demonstrate that FDDiff outperforms prior generative methods with higher-fidelity super-resolution results.
翻译:单图像超分辨率性能高度依赖于如何生成并补充低分辨率图像缺失的高频细节。近年来,基于扩散的模型在超分辨率任务中展现出生成高质量图像的巨大潜力。然而,现有模型在仅以高分辨率真实图像作为所有采样时间步目标时,难以直接预测宽频带的高频信息。为解决这一问题并实现更高质量的超分辨率,我们提出了一种新型频域引导多尺度扩散模型(FDDiff),该模型将高频信息补充过程分解为更细粒度的步骤。具体而言,我们构建了基于小波包的频率补充链,为反向扩散过程提供带宽递增的多尺度中间目标。进而,FDDiff引导反向扩散过程逐步补充随采样时间步缺失的高频细节。此外,我们设计了一个多尺度频率精炼网络,在单一网络内预测多尺度所需的高频分量。在主流基准上的全面评估表明,FDDiff在超分辨率结果保真度上优于现有生成方法。