This paper explores the challenges and benefits of a trainable destruction process in diffusion samplers -- diffusion-based generative models trained to sample an unnormalised density without access to data samples. Contrary to the majority of work that views diffusion samplers as approximations to an underlying continuous-time model, we view diffusion models as discrete-time policies trained to produce samples in very few generation steps. We propose to trade some of the elegance of the underlying theory for flexibility in the definition of the generative and destruction policies. In particular, we decouple the generation and destruction variances, enabling both transition kernels to be learned as unconstrained Gaussian densities. We show that, when the number of steps is limited, training both generation and destruction processes results in faster convergence and improved sampling quality on various benchmarks. Through a robust ablation study, we investigate the design choices necessary to facilitate stable training. Finally, we show the scalability of our approach through experiments on GAN latent space sampling for conditional image generation.
翻译:本文探讨了扩散采样器中可训练破坏过程的挑战与优势——这类基于扩散的生成模型旨在无需数据样本的情况下对未归一化密度进行采样。与大多数将扩散采样器视为底层连续时间模型近似的研究不同,我们将扩散模型视为经过训练、能在极少数生成步骤中产生样本的离散时间策略。我们建议牺牲部分底层理论的优雅性,以换取生成策略与破坏策略定义上的灵活性。具体而言,我们解耦了生成方差与破坏方差,使两个转移核都能作为无约束高斯密度进行学习。研究表明,在有限步数条件下,同时训练生成过程与破坏过程可在多种基准测试中实现更快的收敛速度与更优的采样质量。通过严格的消融实验,我们探究了实现稳定训练所必需的设计选择。最后,通过条件图像生成中GAN潜在空间采样的实验,验证了该方法具有良好的可扩展性。