We empirically study the effect of noise scheduling strategies for denoising diffusion generative models. There are three findings: (1) the noise scheduling is crucial for the performance, and the optimal one depends on the task (e.g., image sizes), (2) when increasing the image size, the optimal noise scheduling shifts towards a noisier one (due to increased redundancy in pixels), and (3) simply scaling the input data by a factor of $b$ while keeping the noise schedule function fixed (equivalent to shifting the logSNR by $\log b$) is a good strategy across image sizes. This simple recipe, when combined with recently proposed Recurrent Interface Network (RIN), yields state-of-the-art pixel-based diffusion models for high-resolution images on ImageNet, enabling single-stage, end-to-end generation of diverse and high-fidelity images at 1024$\times$1024 resolution (without upsampling/cascades).
翻译:我们通过实验系统研究了去噪扩散生成模型中噪声调度策略的影响。主要发现包括:(1) 噪声调度对模型性能至关重要,且最优调度取决于具体任务(如图像尺寸);(2) 随着图像尺寸增大,最优噪声调度会向更高噪声方向偏移(这是由于像素冗余度增加所致);(3) 保持噪声调度函数不变,仅将输入数据缩放$b$倍(相当于将logSNR平移$\log b$)是一种适用于不同图像尺寸的有效策略。该简单方案与近期提出的循环接口网络(RIN)相结合,在ImageNet数据集上实现了高分辨率图像的像素级扩散模型最优性能,无需上采样/级联操作即可在1024$\times$1024分辨率下实现端到端的单阶段多样化高保真图像生成。