Previous results have shown that a two time-scale update rule (TTUR) using different learning rates, such as different constant rates or different decaying rates, is useful for training generative adversarial networks (GANs) in theory and in practice. Moreover, not only the learning rate but also the batch size is important for training GANs with TTURs and they both affect the number of steps needed for training. This paper studies the relationship between batch size and the number of steps needed for training GANs with TTURs based on constant learning rates. We theoretically show that, for a TTUR with constant learning rates, the number of steps needed to find stationary points of the loss functions of both the discriminator and generator decreases as the batch size increases and that there exists a critical batch size minimizing the stochastic first-order oracle (SFO) complexity. Then, we use the Fr'echet inception distance (FID) as the performance measure for training and provide numerical results indicating that the number of steps needed to achieve a low FID score decreases as the batch size increases and that the SFO complexity increases once the batch size exceeds the measured critical batch size. Moreover, we show that measured critical batch sizes are close to the sizes estimated from our theoretical results.
翻译:先前的研究结果表明,采用不同学习率(如恒定不同学习率或不同衰减率)的双时间尺度更新规则(TTUR)在理论和实践中对训练生成对抗网络(GANs)均有效。此外,对于采用TTUR的GANs训练,不仅学习率至关重要,批量大小同样重要,两者共同影响训练所需的步数。本文研究了基于恒定学习率的TTUR-GAN训练中批量大小与所需步数之间的关系。我们从理论上证明:对于恒定学习率的TTUR,鉴别器和生成器损失函数达到驻点所需的步数随批量大小增大而减少,且存在一个最小化随机一阶优化复杂度(SFO)的临界批量大小。随后,我们采用弗雷歇初始距离(FID)作为训练性能度量,数值结果表明:达到低FID得分所需的步数随批量大小增大而减少,且当批量大小超过实测临界批量后SFO复杂度增加。此外,我们实测的临界批量大小与基于理论结果估计的批量大小高度吻合。