We empirically study the effect of noise scheduling strategies for denoising diffusion generative models. There are three findings: (1) the noise scheduling is crucial for the performance, and the optimal one depends on the task (e.g., image sizes), (2) when increasing the image size, the optimal noise scheduling shifts towards a noisier one (due to increased redundancy in pixels), and (3) simply scaling the input data by a factor of $b$ while keeping the noise schedule function fixed (equivalent to shifting the logSNR by $\log b$) is a good strategy across image sizes. This simple recipe, when combined with recently proposed Recurrent Interface Network (RIN), yields state-of-the-art pixel-based diffusion models for high-resolution images on ImageNet, enabling single-stage, end-to-end generation of diverse and high-fidelity images at 1024$\times$1024 resolution (without upsampling/cascades).
翻译:我们通过实验研究了噪声调度策略对去噪扩散生成模型的影响。主要有三个发现:(1) 噪声调度对模型性能至关重要,且最优调度策略取决于具体任务(如图像尺寸);(2) 随着图像尺寸增大,最优噪声调度会向更高噪声水平偏移(由于像素冗余增加);(3) 将输入数据乘以缩放因子$b$同时保持噪声调度函数固定(等价于将logSNR平移$\log b$)是一种适用于不同图像尺寸的有效策略。这种简单策略与近期提出的循环接口网络(RIN)相结合,在ImageNet高分辨率图像上实现了像素级扩散模型的最佳性能,能够以1024$\times$1024分辨率单阶段端到端生成多样且高保真的图像(无需上采样/级联处理)。