Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the step length. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates second-order information that allows adjusting not only the step length but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel diagonalization approach makes the selected step lengths adaptive. We provide convergence rate analysis, and experimentally show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems. More precisely, SMB requires less tuning, and shows comparable performance to other adaptive methods.
翻译:随机梯度下降方法及其变体是解决机器学习问题的核心优化算法,能够在良好收敛率下取得效果。这些收敛率的实现尤其依赖于针对具体应用场景对算法进行精细调参。尽管调参过程可能耗费大量计算资源,但近期研究表明,通过迭代调整步长的线搜索方法可有效降低此类成本。我们提出一种基于前向步进模型构建的新算法,作为随机线搜索的替代方案。该模型构建步骤引入二阶信息,不仅能够调整步长,还能优化搜索方向。考虑到深度学习模型参数以组群形式(张量层)存在,我们的方法针对每组参数构建独立模型并计算新步长。这种创新性对角化方法使所选步长具备自适应能力。我们给出了收敛率分析,并通过实验证明所提算法在经典测试问题上具有更快的收敛速度和更好的泛化性能。具体而言,SMB算法所需调参更少,且性能与其他自适应方法相当。