We study the problem of training neural stochastic differential equations, or diffusion models, to sample from a Boltzmann distribution without access to target samples. Existing methods for training such models enforce time-reversal of the generative and noising processes, using either differentiable simulation or off-policy reinforcement learning (RL). We prove equivalences between families of objectives in the limit of infinitesimal discretization steps, linking entropic RL methods (GFlowNets) with continuous-time objects (partial differential equations and path space measures). We further show that an appropriate choice of coarse time discretization during training allows greatly improved sample efficiency and the use of time-local objectives, achieving competitive performance on standard sampling benchmarks with reduced computational cost.
翻译:我们研究了训练神经随机微分方程(即扩散模型)从玻尔兹曼分布中采样的方法,该方法无需访问目标样本。现有训练此类模型的方法通过可微分模拟或离策略强化学习(RL)强制生成过程与加噪过程的时间可逆性。我们证明了在无穷小离散化步长极限下,不同目标函数族之间的等价关系,从而将熵强化学习方法(GFlowNets)与连续时间对象(偏微分方程与路径空间测度)联系起来。我们进一步证明,在训练过程中选择合适的粗粒度时间离散化方案可显著提升样本效率,并支持使用时间局部化目标函数,从而在标准采样基准测试中以更低计算成本实现具有竞争力的性能。