Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcrafted and require manual tuning across different resolutions. In this work, we propose a principled way to design per-instance noise schedules for pixel diffusion, based on the image's spectral properties. By deriving theoretical bounds on the efficacy of minimum and maximum noise levels, we design ``tight'' noise schedules that eliminate redundant steps. During inference, we propose to conditionally sample such noise schedules. Experiments show that our noise schedules improve generative quality of single-stage pixel diffusion models, particularly in the low-step regime.
翻译:去噪扩散模型广泛应用于高质量图像与视频生成。其性能依赖于噪声调度——该调度定义了训练过程中噪声水平的分布以及采样过程中噪声水平的遍历序列。传统噪声调度通常依赖人工设计,且需针对不同分辨率进行手动调优。本文提出一种基于图像频谱特性的逐样本噪声调度设计方案。通过推导最小与最大噪声水平效用的理论边界,我们设计了能够消除冗余步骤的“紧致”噪声调度。在推理阶段,我们提出对这种噪声调度进行条件采样。实验表明,本方法可提升单阶段像素扩散模型的生成质量,尤其在低步数生成场景下效果显著。