This paper proposes a new theoretical lens to view Wasserstein generative adversarial networks (WGANs). To minimize the Wasserstein-1 distance between the true data distribution and our estimate of it, we derive a distribution-dependent ordinary differential equation (ODE) which represents the gradient flow of the Wasserstein-1 loss, and show that a forward Euler discretization of the ODE converges. This inspires a new class of generative models that naturally integrates persistent training (which we call W1-FE). When persistent training is turned off, we prove that W1-FE reduces to WGAN. When we intensify persistent training, W1-FE is shown to outperform WGAN in training experiments from low to high dimensions, in terms of both convergence speed and training results. Intriguingly, one can reap the benefits only when persistent training is carefully integrated through our ODE perspective. As demonstrated numerically, a naive inclusion of persistent training in WGAN (without relying on our ODE framework) can significantly worsen training results.
翻译:本文提出了一种新的理论视角来审视Wasserstein生成对抗网络(WGANs)。为最小化真实数据分布与其估计之间的Wasserstein-1距离,我们推导出一个分布依赖的常微分方程(ODE),该方程表征了Wasserstein-1损失的梯度流,并证明了该ODE的前向欧拉离散化具有收敛性。这启发了一类新型生成模型,其自然整合了持续性训练(我们称之为W1-FE)。当关闭持续性训练时,我们证明W1-FE退化为WGAN。当加强持续性训练时,实验表明从低维到高维的训练任务中,W1-FE在收敛速度和训练结果方面均优于WGAN。值得注意的是,只有通过我们的ODE视角精心整合持续性训练时,才能获得这些优势。数值实验证明,在WGAN中简单引入持续性训练(不依赖我们的ODE框架)会显著恶化训练结果。