Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant $n$-body particle systems. We show that the proposed approach achieves state-of-the-art performance on all metrics and trains $2-5\times$ faster, which allows it to be the first method to train using energy on the challenging $55$-particle Lennard-Jones system.
翻译:从非归一化概率分布(例如多体系统的平衡态样本)中高效生成统计独立的样本,是科学领域的一个基础性问题。本文提出迭代去噪能量匹配(iDEM),这是一种迭代算法,利用仅依赖能量函数及其梯度(无需数据样本)的新型随机分数匹配目标,训练基于扩散的采样器。具体而言,iDEM 交替进行以下操作:(I)从基于扩散的采样器中采样高模型密度区域,(II)将这些样本用于我们的随机匹配目标,以进一步改进采样器。iDEM 具备高维可扩展性,其内部匹配目标无需仿真且无需马尔可夫链蒙特卡洛样本。此外,通过利用扩散的快速模式混合特性,iDEM 平滑了能量景观,从而实现了高效探索和摊销采样器的学习。我们在从标准合成能量函数到不变性 $n$ 体粒子系统的一系列任务上评估了 iDEM。结果表明,所提出的方法在所有指标上均达到最优性能,且训练速度快 $2-5$ 倍,这使其成为首个在具有挑战性的 $55$ 粒子伦纳德-琼斯系统上利用能量进行训练的方法。