One of the training strategies of generative models is to minimize the Jensen--Shannon divergence between the model distribution and the data distribution. Since data distribution is unknown, generative adversarial networks (GANs) formulate this problem as a game between two models, a generator and a discriminator. The training can be formulated in the context of game theory and the local Nash equilibrium (LNE). It does not seem feasible to derive guarantees of stability or optimality for the existing methods. This optimization problem is far more challenging than the single objective setting. Here, we use the conjugate gradient method to reliably and efficiently solve the LNE problem in GANs. We give a proof and convergence analysis under mild assumptions showing that the proposed method converges to a LNE with three different learning rate update rules, including a constant learning rate. Finally, we demonstrate that the proposed method outperforms stochastic gradient descent (SGD) and momentum SGD in terms of best Frechet inception distance (FID) score and outperforms Adam on average. The code is available at \url{https://github.com/Hiroki11x/ConjugateGradient_GAN}.
翻译:生成模型的一种训练策略是最小化模型分布与数据分布之间的詹森-香农散度。由于数据分布未知,生成对抗网络将该问题建模为两个模型——生成器和判别器——之间的博弈。该训练过程可在博弈论框架下被形式化为局部纳什均衡问题。现有方法似乎难以推导出稳定性或最优性的理论保证,且此类优化问题远较单一目标场景更具挑战性。本文采用共轭梯度法可靠且高效地求解生成对抗网络中的局部纳什均衡问题。我们在温和假设下给出收敛性分析与理论证明,表明所提方法在三种不同学习率更新规则(包括恒定学习率)下均可收敛至局部纳什均衡。最后,实验表明所提方法在最优弗雷歇初始距离分数上优于随机梯度下降法和动量随机梯度下降法,并在平均性能上优于Adam优化器。代码已开源至 \url{https://github.com/Hiroki11x/ConjugateGradient_GAN}。