Classical diffusion models typically rely on isotropic Gaussian noise, treating all regions uniformly and overlooking structural information important for high-quality generation. We introduce an edge-preserving diffusion process that generalizes isotropic models via a hybrid noise scheme with an edge-aware scheduler that smoothly transitions from edge-preserving to isotropic noise. This enables the model to capture fine structural details while generally maintaining global performance. We evaluate the impact of structure-aware noise in both diffusion and flow-matching frameworks, and show that existing isotropic models can be efficiently fine-tuned with edge-preserving noise, making our framework practical for adapting pre-trained systems. Beyond unconditional generation, our method particularly shows improvements in structure-guided tasks such as stroke-to-image synthesis, improving robustness and perceptual quality, as evidenced by consistent improvements across FID, KID, and CLIP-score.
翻译:经典扩散模型通常依赖各向同性高斯噪声,对图像区域一视同仁,忽略了高质量生成所需的结构信息。本文提出一种保边扩散过程,通过混合噪声方案与边缘感知调度器,将各向同性模型泛化到从保边噪声到各向同性噪声的平滑过渡。这使得模型既能捕捉精细结构细节,又能总体上保持全局性能。我们在扩散框架与流匹配框架中评估了结构感知噪声的影响,并表明现有各向同性模型可通过保边噪声高效微调,使我们的框架适用于预训练系统的适配。除无条件生成外,我们的方法在结构引导任务(如笔画到图像合成)中尤其表现出改进,提升了鲁棒性与感知质量——FID、KID和CLIP评分的持续改善即为佐证。