Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appealing theoretical properties for non-convex optimization by concurrently computing function value, gradient, and Hessian matrix to obtain the next search direction and the adjusted parameters. Although stochastic approximations help largely reduce the computational cost, it is challenging to theoretically guarantee the convergence rate. In this paper, we explore a family of stochastic TR and ARC methods that can simultaneously provide inexact computations of the Hessian matrix, gradient, and function values. Our algorithms require much fewer propagations overhead per iteration than TR and ARC. We prove that the iteration complexity to achieve $\epsilon$-approximate second-order optimality is of the same order as the exact computations demonstrated in previous studies. Additionally, the mild conditions on inexactness can be met by leveraging a random sampling technology in the finite-sum minimization problem. Numerical experiments with a non-convex problem support these findings and demonstrate that, with the same or a similar number of iterations, our algorithms require less computational overhead per iteration than current second-order methods.
翻译:信赖域和自适应三次正则化方法通过同时计算函数值、梯度和黑塞矩阵来获取下一搜索方向及调整参数,已被证明在非凸优化中具有非常吸引人的理论性质。虽然随机近似能大幅降低计算成本,但理论上保证收敛速度仍具挑战性。本文探索了一类随机信赖域和自适应三次正则化方法,可同时提供黑塞矩阵、梯度和函数值的不精确计算。与标准信赖域和自适应三次正则化方法相比,我们的算法每次迭代所需的前向传播开销更少。我们证明,达到$\epsilon$-近似二阶最优性的迭代复杂度与先前研究中展示的精确计算同阶。此外,通过利用有限和最小化问题中的随机采样技术,不精确的温和条件可得以满足。非凸问题的数值实验验证了这些发现,表明在相同或相似迭代次数下,我们的算法每次迭代所需的计算开销低于当前二阶方法。