Previous results have shown that a two time-scale update rule (TTUR) using different learning rates, such as different constant rates or different decaying rates, is useful for training generative adversarial networks (GANs) in theory and in practice. Moreover, not only the learning rate but also the batch size is important for training GANs with TTURs and they both affect the number of steps needed for training. This paper studies the relationship between batch size and the number of steps needed for training GANs with TTURs based on constant learning rates. We theoretically show that, for a TTUR with constant learning rates, the number of steps needed to find stationary points of the loss functions of both the discriminator and generator decreases as the batch size increases and that there exists a critical batch size minimizing the stochastic first-order oracle (SFO) complexity. Then, we use the Fr'echet inception distance (FID) as the performance measure for training and provide numerical results indicating that the number of steps needed to achieve a low FID score decreases as the batch size increases and that the SFO complexity increases once the batch size exceeds the measured critical batch size. Moreover, we show that measured critical batch sizes are close to the sizes estimated from our theoretical results.
翻译:先前结果表明,采用不同学习率(如恒定学习率或衰减学习率)的双时间尺度更新规则在理论上和实践中有助于训练生成对抗网络。此外,对于使用双时间尺度更新规则训练生成对抗网络而言,不仅学习率重要,批量大小同样重要,两者均影响训练所需的迭代步数。本文研究基于恒定学习率的双时间尺度更新规则下,批量大小与训练生成对抗网络所需迭代步数之间的关系。我们从理论上证明:对于采用恒定学习率的双时间尺度更新规则,随着批量大小增大,找到判别器和生成器损失函数稳定点所需的迭代步数减少,且存在一个最小化随机一阶预言复杂度的临界批量大小。随后,以弗雷歇初始距离作为训练性能指标,数值结果表明:随着批量大小增大,达到较低弗雷歇初始距离分值所需的迭代步数减少;而一旦批量大小超过测得的临界批量大小,随机一阶预言复杂度会上升。进一步地,我们表明测得的临界批量大小与理论结果估计的批量大小接近。