Guidance is a crucial technique for extracting the best performance out of image-generating diffusion models. Traditionally, a constant guidance weight has been applied throughout the sampling chain of an image. We show that guidance is clearly harmful toward the beginning of the chain (high noise levels), largely unnecessary toward the end (low noise levels), and only beneficial in the middle. We thus restrict it to a specific range of noise levels, improving both the inference speed and result quality. This limited guidance interval improves the record FID in ImageNet-512 significantly, from 1.81 to 1.40. We show that it is quantitatively and qualitatively beneficial across different sampler parameters, network architectures, and datasets, including the large-scale setting of Stable Diffusion XL. We thus suggest exposing the guidance interval as a hyperparameter in all diffusion models that use guidance.
翻译:引导是发挥图像生成扩散模型最佳性能的关键技术。传统上,在图像的整个采样链中会施加恒定的引导权重。我们证明,引导在采样链起始阶段(高噪声水平)明显有害,在结束阶段(低噪声水平)基本无益,仅在中段区间具有积极作用。因此,我们将其限制在特定的噪声水平范围内,从而同时提升了推理速度与生成结果质量。这种有限引导区间将ImageNet-512的FID记录从1.81显著提升至1.40。我们通过实验证明,该方法在不同采样器参数、网络架构和数据集(包括Stable Diffusion XL的大规模设定)中均能带来定量与定质的提升。因此,我们建议在所有采用引导的扩散模型中,将引导区间作为可调节的超参数开放使用。