We empirically study the effect of noise scheduling strategies for denoising diffusion generative models. There are three findings: (1) the noise scheduling is crucial for the performance, and the optimal one depends on the task (e.g., image sizes), (2) when increasing the image size, the optimal noise scheduling shifts towards a noisier one (due to increased redundancy in pixels), and (3) simply scaling the input data by a factor of $b$ while keeping the noise schedule function fixed (equivalent to shifting the logSNR by $\log b$) is a good strategy across image sizes. This simple recipe, when combined with recently proposed Recurrent Interface Network (RIN), yields state-of-the-art pixel-based diffusion models for high-resolution images on ImageNet, enabling single-stage, end-to-end generation of diverse and high-fidelity images at 1024$\times$1024 resolution for the first time (without upsampling/cascades).
翻译:我们通过实验研究了噪声调度策略对去噪扩散生成模型的影响。主要发现有三点:(1)噪声调度对模型性能至关重要,最优调度策略取决于具体任务(如图像尺寸);(2)当图像尺寸增大时,最优噪声调度会向更高噪声方向偏移(这是由于像素冗余增加所致);(3)将输入数据简单地乘以因子$b$并固定噪声调度函数(等价于将logSNR平移$\log b$),是适用于不同图像尺寸的有效策略。这一简洁方案与近期提出的循环接口网络(RIN)相结合,在ImageNet高分辨率图像上实现了基于像素的扩散模型最优性能,首次实现了单阶段、端到端生成1024$\times$1024分辨率的高保真多样化图像(无需上采样/级联架构)。