Classical generative diffusion models learn an isotropic Gaussian denoising process, treating all spatial regions uniformly, thus neglecting potentially valuable structural information in the data. Inspired by the long-established work on anisotropic diffusion in image processing, we present a novel edge-preserving diffusion model that is a generalization of denoising diffusion probablistic models (DDPM). In particular, we introduce an edge-aware noise scheduler that varies between edge-preserving and isotropic Gaussian noise. We show that our model's generative process converges faster to results that more closely match the target distribution. We demonstrate its capability to better learn the low-to-mid frequencies within the dataset, which plays a crucial role in representing shapes and structural information. Our edge-preserving diffusion process consistently outperforms state-of-the-art baselines in unconditional image generation. It is also more robust for generative tasks guided by a shape-based prior, such as stroke-to-image generation. We present qualitative and quantitative results showing consistent improvements (FID score) of up to 30% for both tasks.
翻译:经典的生成扩散模型学习各向同性的高斯去噪过程,对所有空间区域进行均匀处理,从而忽略了数据中潜在的有价值的结构信息。受图像处理领域长期存在的各向异性扩散工作的启发,我们提出了一种新颖的边缘保持扩散模型,该模型是去噪扩散概率模型(DDPM)的推广。具体而言,我们引入了一种边缘感知的噪声调度器,可在边缘保持噪声与各向同性高斯噪声之间变化。我们证明了我们模型的生成过程能更快地收敛到更接近目标分布的结果。我们展示了其能更好地学习数据集中的低频至中频信息,这对于表示形状和结构信息至关重要。我们的边缘保持扩散过程在无条件图像生成任务中始终优于最先进的基线模型。对于由基于形状的先验信息引导的生成任务(例如笔画到图像生成),它也表现出更强的鲁棒性。我们展示了定性和定量结果,表明在这两项任务上均实现了高达30%的持续改进(FID分数)。