Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the stepsize. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates second-order information that allows adjusting not only the stepsize but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel diagonalization approach makes the selected step lengths adaptive. We provide convergence rate analysis, and experimentally show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems. More precisely, SMB requires less tuning, and shows comparable performance to other adaptive methods.
翻译:随机梯度下降法及其变体是解决机器学习问题的核心优化算法,能够实现良好的收敛速度。这些收敛速度尤其体现在算法针对具体应用进行微调时。尽管调优过程需要大量计算成本,但近期研究表明,通过迭代调整步长的线搜索方法可有效降低这些成本。我们提出了一种基于前向步模型构建新算法的随机线搜索替代方案。该模型构建步骤引入了二阶信息,不仅能够调整步长,还能优化搜索方向。鉴于深度学习模型参数具有分组特性(张量层),我们的方法针对每个参数组构建模型并计算新步长。这种新颖的对角化方法使所选的步长具有自适应性。我们提供了收敛速率分析,并通过实验证明该算法在经典测试问题中能实现更快的收敛速度和更好的泛化效果。具体而言,SMB(模型构建随机梯度下降法)需要更少的调优工作,其性能与其他自适应方法相当。