Previous results have shown that a two time-scale update rule (TTUR) using different learning rates, such as different constant rates or different decaying rates, is useful for training generative adversarial networks (GANs) in theory and in practice. Moreover, not only the learning rate but also the batch size is important for training GANs with TTURs and they both affect the number of steps needed for training. This paper studies the relationship between batch size and the number of steps needed for training GANs with TTURs based on constant learning rates. We theoretically show that, for a TTUR with constant learning rates, the number of steps needed to find stationary points of the loss functions of both the discriminator and generator decreases as the batch size increases and that there exists a critical batch size minimizing the stochastic first-order oracle (SFO) complexity. Then, we use the Fr'echet inception distance (FID) as the performance measure for training and provide numerical results indicating that the number of steps needed to achieve a low FID score decreases as the batch size increases and that the SFO complexity increases once the batch size exceeds the measured critical batch size. Moreover, we show that measured critical batch sizes are close to the sizes estimated from our theoretical results.
翻译:先前研究表明,采用不同学习率(如恒定速率或衰减速率)的双时间尺度更新规则(TTUR)在理论和实践上对训练生成对抗网络(GANs)均有效。此外,对于采用TTUR的GANs训练,不仅学习率至关重要,批次大小同样关键,二者共同影响训练所需步数。本文研究了基于恒定学习率的TTUR中,批次大小与训练GANs所需步数之间的关系。我们理论上证明:对于使用恒定学习率的TTUR,随着批次大小增加,找到判别器和生成器损失函数稳定点所需的步数会减少,且存在一个最小化随机一阶梯度复杂度(SFO)的临界批次大小。随后,我们采用弗雷歇初始距离(FID)作为训练性能指标,实验结果表明:实现低FID得分所需的训练步数随批次增大而减少,而当批次大小超过测量得到的临界批次时,SFO复杂度会增加。此外,我们证明测量得到的临界批次大小与理论估计值高度吻合。