When training neural networks with low-precision computation, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers; in this paper we study the influence of rounding errors on the convergence of the gradient descent method for problems satisfying the Polyak-\Lojasiewicz inequality. Within this context, we show that, in contrast, biased stochastic rounding errors may be beneficial since choosing a proper rounding strategy eliminates the vanishing gradient problem and forces the rounding bias in a descent direction. Furthermore, we obtain a bound on the convergence rate that is stricter than the one achieved by unbiased stochastic rounding. The theoretical analysis is validated by comparing the performances of various rounding strategies when optimizing several examples using low-precision fixed-point number formats.
翻译:在使用低精度计算训练神经网络时,舍入误差常导致优化器收敛停滞或产生不利影响;本文研究了在满足Polyak-Lojasiewicz不等式的问题中,舍入误差对梯度下降法收敛性的影响。在此背景下,我们发现偏置随机舍入误差反而可能产生积极作用——选择合适的舍入策略能够消除梯度消失问题,并使舍入偏置强制处于下降方向。此外,我们获得了比无偏随机舍入更严格的收敛速率界限。通过比较使用低精度定点数格式优化多个示例时不同舍入策略的性能,理论分析得到了验证。