We derive a minimalist but powerful deterministic denoising-diffusion model. While denoising diffusion has shown great success in many domains, its underlying theory remains largely inaccessible to non-expert users. Indeed, an understanding of graduate-level concepts such as Langevin dynamics or score matching appears to be required to grasp how it works. We propose an alternative approach that requires no more than undergrad calculus and probability. We consider two densities and observe what happens when random samples from these densities are blended (linearly interpolated). We show that iteratively blending and deblending samples produces random paths between the two densities that converge toward a deterministic mapping. This mapping can be evaluated with a neural network trained to deblend samples. We obtain a model that behaves like deterministic denoising diffusion: it iteratively maps samples from one density (e.g., Gaussian noise) to another (e.g., cat images). However, compared to the state-of-the-art alternative, our model is simpler to derive, simpler to implement, more numerically stable, achieves higher quality results in our experiments, and has interesting connections to computer graphics.
翻译:我们推导出一种极简但强大的确定性去噪扩散模型。尽管去噪扩散在诸多领域已展现出巨大成功,但其底层理论对非专业用户而言仍难以理解。事实上,理解其工作原理似乎需要掌握朗之万动力学或分数匹配等研究生级概念。我们提出一种替代方法,仅需本科微积分和概率论知识。通过考虑两个密度函数,并观察从这些密度中随机抽取的样本在混合(线性插值)时的行为,我们证明迭代混合与去混合样本可在两个密度之间产生收敛于确定性映射的随机路径。该映射可通过训练用于去混合样本的神经网络进行评估。我们最终获得的模型行为与确定性去噪扩散一致:它将样本从一个密度(如高斯噪声)迭代映射到另一个密度(如猫图像)。然而,与现有最优方法相比,我们的模型推导更简洁、实现更简便、数值稳定性更高,在实验中取得了更高质量的结果,并与计算机图形学存在有趣关联。