Trust-region methods (TR) can converge quadratically to minima where the Hessian is positive definite. However, if the minima are not isolated, then the Hessian there cannot be positive definite. The weaker Polyak$\unicode{x2013}${\L}ojasiewicz (P{\L}) condition is compatible with non-isolated minima, and it is enough for many algorithms to preserve good local behavior. Yet, TR with an $\textit{exact}$ subproblem solver lacks even basic features such as a capture theorem under P{\L}. In practice, a popular $\textit{inexact}$ subproblem solver is the truncated conjugate gradient method (tCG). Empirically, TR-tCG exhibits super-linear convergence under P{\L}. We confirm this theoretically. The main mathematical obstacle is that, under P{\L}, at points arbitrarily close to minima, the Hessian has vanishingly small, possibly negative eigenvalues. Thus, tCG is applied to ill-conditioned, indefinite systems. Yet, the core theory underlying tCG is that of CG, which assumes a positive definite operator. Accordingly, we develop new tools to analyze the dynamics of CG in the presence of small eigenvalues of any sign, for the regime of interest to TR-tCG.
翻译:信赖域方法(TR)在Hessian矩阵正定的极小值点处能够实现二次收敛。然而,当极小值点非孤立时,其Hessian矩阵无法满足正定性条件。较弱的Polyak–Łojasiewicz(PŁ)条件与非孤立极小值点相容,且足以保证许多算法保持良好的局部收敛行为。然而,采用精确子问题求解器的TR方法在PŁ条件下甚至缺乏基本的性质(如捕获定理)。实际应用中,截断共轭梯度法(tCG)是常用的非精确子问题求解器。经验表明,TR-tCG在PŁ条件下呈现超线性收敛特性。本文从理论上证实了这一现象。主要数学难点在于:在PŁ条件下,任意接近极小值点的位置,Hessian矩阵可能具有趋近于零的负特征值。因此,tCG需应用于病态不定系统。然而,tCG的理论基础源于共轭梯度法(CG),后者要求算子具有正定性。为此,我们针对TR-tCG关注的场景,发展了分析任意符号小特征值条件下CG动力学的新工具。