We introduce Equilibrium Matching (EqM), a generative modeling framework built from an equilibrium dynamics perspective. EqM discards the non-equilibrium, time-conditional dynamics in traditional diffusion and flow-based generative models and instead learns the equilibrium gradient of an implicit energy landscape. Through this approach, we can adopt an optimization-based sampling process at inference time, where samples are obtained by gradient descent on the learned landscape with adjustable step sizes, adaptive optimizers, and adaptive compute. EqM surpasses the generation performance of diffusion/flow models empirically, achieving an FID of 1.90 on ImageNet 256$\times$256. EqM is also theoretically justified to learn and sample from the data manifold. Beyond generation, EqM is a flexible framework that naturally handles tasks including partially noised image denoising, OOD detection, and image composition. By replacing time-conditional velocities with a unified equilibrium landscape, EqM offers a tighter bridge between flow and energy-based models and a simple route to optimization-driven inference.
翻译:我们提出了平衡匹配(EqM),一种从平衡动力学视角构建的生成建模框架。EqM摒弃了传统扩散模型和基于流的生成模型中非平衡、时间条件依赖的动力学过程,转而学习一个隐式能量景观的平衡梯度。通过这种方法,我们可以在推理时采用基于优化的采样过程:通过在习得的景观上进行梯度下降来获得样本,该过程具有可调节的步长、自适应优化器和自适应计算能力。实验表明,EqM在生成性能上超越了扩散/流模型,在ImageNet 256×256数据集上实现了1.90的FID分数。理论上也证明EqM能够从数据流形中学习和采样。除了生成任务,EqM还是一个灵活的框架,能够自然地处理包括部分噪声图像去噪、分布外检测和图像合成在内的多种任务。通过用时条件依赖的速度场替换为统一的平衡景观,EqM在流模型与能量基模型之间建立了更紧密的桥梁,并为优化驱动的推理提供了一条简洁的路径。