It is very well known that when the exact line search gradient descent method is applied to a convex quadratic objective, the worst-case rate of convergence (ROC), among all seed vectors, deteriorates as the condition number of the Hessian of the objective grows. By an elegant analysis due to H. Akaike, it is generally believed -- but not proved -- that in the ill-conditioned regime the ROC for almost all initial vectors, and hence also the average ROC, is close to the worst case ROC. We complete Akaike's analysis using the theorem of center and stable manifolds. Our analysis also makes apparent the effect of an intermediate eigenvalue in the Hessian by establishing the following somewhat amusing result: In the absence of an intermediate eigenvalue, the average ROC gets arbitrarily \emph{fast} -- not slow -- as the Hessian gets increasingly ill-conditioned. We discuss in passing some contemporary applications of exact line search GD to polynomial optimization problems arising from imaging and data sciences. In particular, we observe that a tailored exact line search GD algorithm for a POP arising from the phase retrieval problem is only 50\% more expensive per iteration than its constant step size counterpart, while promising a ROC only matched by the optimally tuned (constant) step size which can almost never be achieved in practice.
翻译:众所周知,当精确线搜索梯度下降方法应用于凸二次目标函数时,在所有初始向量中最坏情况下的收敛率会随着目标函数Hessian矩阵条件数的增大而恶化。根据H. Akaike的一项优雅分析,学界普遍认为——但未得到证明——在病态条件下,几乎所有初始向量的收敛率,因而也包括平均收敛率,都接近于最坏情况收敛率。我们利用中心流形与稳定流形定理完善了Akaike的分析。我们的分析还通过建立以下颇具意味的结果,清晰地揭示了Hessian矩阵中特征值的影响:当不存在中间特征值时,随着Hessian矩阵病态程度的增加,平均收敛率会变得任意地快——而非慢。我们顺便讨论了精确线搜索梯度下降在成像与数据科学中多项式优化问题的若干当代应用。特别地,我们观察到针对相位恢复问题产生的多项式优化问题所设计的定制化精确线搜索梯度下降算法,其每次迭代计算量仅比固定步长版本多50%,却能提供仅与最优调参(固定)步长相匹配的收敛率——而这种最优步长在实际中几乎无法实现。