Diffusion models are emerging expressive generative models, in which a large number of time steps (inference steps) are required for a single image generation. To accelerate such tedious process, reducing steps uniformly is considered as an undisputed principle of diffusion models. We consider that such a uniform assumption is not the optimal solution in practice; i.e., we can find different optimal time steps for different models. Therefore, we propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training. Specifically, we first design a unified search space that consists of all possible time steps and various architectures. Then, a two stage evolutionary algorithm is introduced to find the optimal solution in the designed search space. To further accelerate the search process, we employ FID score between generated and real samples to estimate the performance of the sampled examples. As a result, the proposed method is (i).training-free, obtaining the optimal time steps and model architecture without any training process; (ii). orthogonal to most advanced diffusion samplers and can be integrated to gain better sample quality. (iii). generalized, where the searched time steps and architectures can be directly applied on different diffusion models with the same guidance scale. Experimental results show that our method achieves excellent performance by using only a few time steps, e.g. 17.86 FID score on ImageNet 64 $\times$ 64 with only four steps, compared to 138.66 with DDIM.
翻译:扩散模型是新兴的生成式表达模型,其在单张图像生成中需要大量时间步(推理步)。为加速这一繁琐过程,减少统一的时间步被视为扩散模型的基本原则。我们认为这种统一假设在实践中并非最优解,即不同模型可以找到不同的最优时间步。因此,我们提出在一个统一框架中搜索最优时间步序列和压缩模型架构,无需任何额外训练即可实现扩散模型的高效图像生成。具体而言,我们首先设计一个包含所有可能时间步和多种架构的统一搜索空间。随后,引入两阶段进化算法在设计的搜索空间中寻找最优解。为进一步加速搜索过程,我们利用生成样本与真实样本之间的FID分数来评估采样示例的性能。由此,所提方法具有以下特性:(i)无训练性,无需任何训练过程即可获得最优时间步和模型架构;(ii)正交性,可与大多数先进扩散采样器结合以提升样本质量;(iii)泛化性,搜索得到的时间步和架构可直接应用于同引导尺度下的不同扩散模型。实验结果表明,我们的方法仅需少量时间步即可实现卓越性能,例如在ImageNet 64×64上仅用四步即达到17.86 FID分数,而DDIM为138.66。