We consider an extension of the Newton-MR algorithm for nonconvex unconstrained optimization to the settings where Hessian information is approximated. Under a particular noise model on the Hessian matrix, we investigate the iteration and operation complexities of this variant to achieve appropriate sub-optimality criteria in several nonconvex settings. We do this by first considering functions that satisfy the (generalized) Polyak-\L ojasiewicz condition, a special sub-class of nonconvex functions. We show that, under certain conditions, our algorithm achieves global linear convergence rate. We then consider more general nonconvex settings where the rate to obtain first order sub-optimality is shown to be sub-linear. In all these settings, we show that our algorithm converges regardless of the degree of approximation of the Hessian as well as the accuracy of the solution to the sub-problem. Finally, we compare the performance of our algorithm with several alternatives on a few machine learning problems.
翻译:本文研究了Hessian信息被近似时非凸无约束优化问题的牛顿-MR算法扩展。在特定的Hessian矩阵噪声模型下,我们分析了该变体算法在多种非凸场景中达到适当次优性准则所需的迭代复杂度和计算复杂度。我们首先考虑满足(广义)Polyak-Łojasiewicz条件的函数——这是非凸函数的一个特殊子类。我们证明,在特定条件下,该算法能够实现全局线性收敛速率。随后我们考察更一般的非凸场景,证明其获得一阶次优解的速率是次线性的。在所有场景中,我们证明该算法的收敛性不受Hessian近似程度及子问题求解精度的影响。最后,我们在若干机器学习问题上将本算法与多种替代方案进行了性能比较。