Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate

Non-asymptotic convergence analysis of quasi-Newton methods has gained attention with a landmark result establishing an explicit superlinear rate of O$((1/\sqrt{t})^t)$. The methods that obtain this rate, however, exhibit a well-known drawback: they require the storage of the previous Hessian approximation matrix or instead storing all past curvature information to form the current Hessian inverse approximation. Limited-memory variants of quasi-Newton methods such as the celebrated L-BFGS alleviate this issue by leveraging a limited window of past curvature information to construct the Hessian inverse approximation. As a result, their per iteration complexity and storage requirement is O$(\tau d)$ where $\tau \le d$ is the size of the window and $d$ is the problem dimension reducing the O$(d^2)$ computational cost and memory requirement of standard quasi-Newton methods. However, to the best of our knowledge, there is no result showing a non-asymptotic superlinear convergence rate for any limited-memory quasi-Newton method. In this work, we close this gap by presenting a limited-memory greedy BFGS (LG-BFGS) method that achieves an explicit non-asymptotic superlinear rate. We incorporate displacement aggregation, i.e., decorrelating projection, in post-processing gradient variations, together with a basis vector selection scheme on variable variations, which greedily maximizes a progress measure of the Hessian estimate to the true Hessian. Their combination allows past curvature information to remain in a sparse subspace while yielding a valid representation of the full history. Interestingly, our established non-asymptotic superlinear convergence rate demonstrates a trade-off between the convergence speed and memory requirement, which to our knowledge, is the first of its kind. Numerical results corroborate our theoretical findings and demonstrate the effectiveness of our method.

翻译：拟牛顿方法的非渐近收敛性分析因一项里程碑式结果——即建立显式的超线性速率O$((1/\sqrt{t})^t)$——而受到广泛关注。然而，获得该速率的方法存在一个众所周知的问题：它们需要存储先前的海森矩阵近似，或者存储所有过去的曲率信息以构造当前的海森逆近似。拟牛顿方法的有限内存变体（如著名的L-BFGS）通过利用有限窗口的过去曲率信息构造海森逆近似，缓解了这一问题。因此，其每次迭代的计算复杂度和存储需求为O$(\tau d)$，其中$\tau \le d$是窗口大小，$d$是问题维度，这降低了标准拟牛顿方法中O$(d^2)$的计算代价和内存需求。然而，据我们所知，目前尚无任何有限内存拟牛顿方法的非渐近超线性收敛速率的结果。在本文中，我们通过提出一种有限内存贪婪BFGS（LG-BFGS）方法弥合了这一空白，该方法达到了显式的非渐近超线性速率。我们将位移聚合（即去相关投影）应用于后处理的梯度变化，并结合变量变化上的基向量选择方案，该方案贪婪地最大化海森估计相对于真实海森矩阵的进展度量。它们的组合使得过去曲率信息能够保留在稀疏子空间中，同时生成完整历史的有效表示。有趣的是，我们建立的非渐近超线性收敛速率展示了收敛速度与内存需求之间的权衡，据我们所知，这是首次此类结果。数值结果验证了我们的理论发现，并证明了所提方法的有效性。

相关内容

拟牛顿法

关注 1

拟牛顿法(Quasi-Newton Methods)是求解非线性优化问题最有效的方法之一，于20世纪50年代由美国Argonne国家实验室的物理学家W. C. Davidon所提出来。Davidon设计的这种算法在当时看来是非线性优化领域最具创造性的发明之一。不久R. Fletcher和M. J. D. Powell证实了这种新的算法远比其他方法快速和可靠，使得非线性优化这门学科在一夜之间突飞猛进。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日