In this paper, we propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs). From a perspective of physical simulation, we redefine the problem of approximating the gradient flow utilizing optimal transport (i.e. Wasserstein) metric. In EBMs, the learning process of stepwise sampling and estimating data distribution performs the functional gradient of minimizing the global relative entropy between the current and target real distribution, which can be treated as dynamic particles moving from disorder to target manifold. Previous learning schemes mainly minimize the entropy concerning the consecutive time KL divergence in each learning step. However, they are prone to being stuck in the local KL divergence by projecting non-smooth information within smooth manifold, which is against the optimal transport principle. To solve this problem, we derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation. Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities. We also derive this near-proximal scheme and provide its numerical computation equations. Our extensive experiments demonstrate the practical superiority and potentials of our proposed scheme on fitting complex distributions and generating high-quality, high-dimensional data with neural EBMs.
翻译:本文提出了一种新颖的数值方案,用于优化学习能量模型(EBM)时的梯度流。从物理模拟的角度出发,我们利用最优传输(即Wasserstein)度量重新定义了逼近梯度流的问题。在EBM中,逐步采样并估计数据分布的学习过程执行了最小化当前分布与目标真实分布之间全局相对熵的函数梯度,这可视为动态粒子从无序状态向目标流形迁移的过程。以往的学习方案主要通过在每个学习步骤中最小化连续时间的KL散度来处理熵。然而,这类方法易陷入局部KL散度,原因在于其将非光滑信息投影到光滑流形中,违背了最优传输原理。为解决此问题,我们基于Fokker-Planck方程推导了全局相对熵的二阶Wasserstein梯度流。与现有方案相比,Wasserstein梯度流是一种更光滑且近乎最优的数值逼近真实数据密度的方法。我们进一步推导了这一近邻近方案,并给出了其数值计算方程。大量实验表明,我们所提出的方案在拟合复杂分布以及利用神经EBM生成高质量、高维数据方面具备实际优越性和潜力。