We propose an online learning algorithm for a class of machine learning models under a separable stochastic approximation framework. The essence of our idea lies in the observation that certain parameters in the models are easier to optimize than others. In this paper, we focus on models where some parameters have a linear nature, which is common in machine learning. In one routine of the proposed algorithm, the linear parameters are updated by the recursive least squares (RLS) algorithm, which is equivalent to a stochastic Newton method; then, based on the updated linear parameters, the nonlinear parameters are updated by the stochastic gradient method (SGD). The proposed algorithm can be understood as a stochastic approximation version of block coordinate gradient descent approach in which one part of the parameters is updated by a second-order SGD method while the other part is updated by a first-order SGD. Global convergence of the proposed online algorithm for non-convex cases is established in terms of the expected violation of a first-order optimality condition. Numerical experiments have shown that the proposed method accelerates convergence significantly and produces more robust training and test performance when compared to other popular learning algorithms. Moreover, our algorithm is less sensitive to the learning rate and outperforms the recently proposed slimTrain algorithm. The code has been uploaded to GitHub for validation.
翻译:我们针对一类可分离随机近似框架下的机器学习模型,提出了一种在线学习算法。该算法的核心思想源于对模型中某些参数比其余参数更易于优化的观察。本文聚焦于机器学习中常见的具有线性性质的参数模型。在提出的算法单次迭代中,线性参数通过递归最小二乘(RLS)算法进行更新——该算法等价于随机牛顿法;随后,基于更新后的线性参数,非线性参数通过随机梯度下降(SGD)法进行更新。所提算法可视为块坐标梯度下降法的一种随机近似版本,其中部分参数采用二阶SGD方法更新,另一部分则采用一阶SGD方法更新。针对非凸情形,本文以一阶最优性条件的期望违反度为指标,建立了所提在线算法的全局收敛性。数值实验表明,与其它主流学习算法相比,本方法能显著加速收敛,并产生更稳健的训练与测试性能。此外,本算法对学习率敏感性较低,且性能优于近期提出的slimTrain算法。相关代码已上传至GitHub以供验证。