Stochastic gradient descent (SGD) has been a go-to algorithm for nonconvex stochastic optimization problems arising in machine learning. Its theory however often requires a strong framework to guarantee convergence properties. We hereby present a full scope convergence study of biased nonconvex SGD, including weak convergence, function-value convergence and global convergence, and also provide subsequent convergence rates and complexities, all under relatively mild conditions in comparison with literature.
翻译:随机梯度下降法(SGD)一直是解决机器学习中非凸随机优化问题的首选算法。然而,其理论分析通常需要一个严格的框架来保证收敛性。本文对带偏置的非凸SGD进行了全面的收敛性研究,包括弱收敛、函数值收敛和全局收敛,并在相较于文献更宽松的条件下,进一步给出了相应的收敛速率与复杂度分析。