One of the training strategies of generative models is to minimize the Jensen--Shannon divergence between the model distribution and the data distribution. Since data distribution is unknown, generative adversarial networks (GANs) formulate this problem as a game between two models, a generator and a discriminator. The training can be formulated in the context of game theory and the local Nash equilibrium (LNE). It does not seem feasible to derive guarantees of stability or optimality for the existing methods. This optimization problem is far more challenging than the single objective setting. Here, we use the conjugate gradient method to reliably and efficiently solve the LNE problem in GANs. We give a proof and convergence analysis under mild assumptions showing that the proposed method converges to a LNE with three different learning rate update rules, including a constant learning rate. Finally, we demonstrate that the proposed method outperforms stochastic gradient descent (SGD) and momentum SGD in terms of best Frechet inception distance (FID) score and outperforms Adam on average. The code is available at \url{https://github.com/Hiroki11x/ConjugateGradient_GAN}.
翻译:生成模型的训练策略之一是使模型分布与数据分布之间的詹森-香农散度最小化。由于数据分布未知,生成对抗网络(GANs)将这一问题建模为生成器与判别器两个模型之间的博弈。该训练过程可在博弈论框架下表述,并涉及局部纳什均衡(LNE)的概念。现有方法似乎难以推导出稳定性或最优性的理论保障。该优化问题比单一目标设定更具挑战性。本文采用共轭梯度法可靠且高效地求解GAN中的LNE问题。我们在温和假设下给出了收敛性分析与证明,表明该方法在三种不同学习率更新规则(包括恒定学习率)下均能收敛至LNE。最后,我们证明该方法在最佳Fréchet初始距离(FID)评分上优于随机梯度下降(SGD)和动量SGD,且在平均性能上优于Adam。代码见\url{https://github.com/Hiroki11x/ConjugateGradient_GAN}。