We consider a variant of the stochastic gradient descent (SGD) with a random learning rate and reveal its convergence properties. SGD is a widely used stochastic optimization algorithm in machine learning, especially deep learning. Numerous studies reveal the convergence properties of SGD and its simplified variants. Among these, the analysis of convergence using a stationary distribution of updated parameters provides generalizable results. However, to obtain a stationary distribution, the update direction of the parameters must not degenerate, which limits the applicable variants of SGD. In this study, we consider a novel SGD variant, Poisson SGD, which has degenerated parameter update directions and instead utilizes a random learning rate. Consequently, we demonstrate that a distribution of a parameter updated by Poisson SGD converges to a stationary distribution under weak assumptions on a loss function. Based on this, we further show that Poisson SGD finds global minima in non-convex optimization problems and also evaluate the generalization error using this method. As a proof technique, we approximate the distribution by Poisson SGD with that of the bouncy particle sampler (BPS) and derive its stationary distribution, using the theoretical advance of the piece-wise deterministic Markov process (PDMP).
翻译:我们研究了一种带有随机学习率的随机梯度下降(SGD)变体,并揭示了其收敛性质。SGD是机器学习(尤其是深度学习)中广泛使用的随机优化算法。大量研究揭示了SGD及其简化变体的收敛特性。其中,利用参数更新平稳分布进行的收敛分析提供了可推广的结果。然而,为获得平稳分布,参数的更新方向必须是非退化的,这限制了可适用的SGD变体范围。在本研究中,我们提出了一种新颖的SGD变体——泊松SGD,该变体具有退化的参数更新方向,但转而采用随机学习率。因此,我们证明了在损失函数的弱假设条件下,由泊松SGD更新的参数分布会收敛到一个平稳分布。基于此,我们进一步表明泊松SGD能够在非凸优化问题中找到全局最小值,并利用该方法评估了泛化误差。作为证明技术,我们通过弹跳粒子采样器(BPS)的分布来近似泊松SGD的分布,并利用分段确定性马尔可夫过程(PDMP)的理论进展推导出其平稳分布。