Online Learning Guided Quasi-Newton Methods with Global Non-Asymptotic Convergence

In this paper, we propose a quasi-Newton method for solving smooth and monotone nonlinear equations, including unconstrained minimization and minimax optimization as special cases. For the strongly monotone setting, we establish two global convergence bounds: (i) a linear convergence rate that matches the rate of the celebrated extragradient method, and (ii) an explicit global superlinear convergence rate that provably surpasses the linear convergence rate after at most ${O}(d)$ iterations, where $d$ is the problem's dimension. In addition, for the case where the operator is only monotone, we prove a global convergence rate of ${O}(\min\{{1}/{k},{\sqrt{d}}/{k^{1.25}}\})$ in terms of the duality gap. This matches the rate of the extragradient method when $k = {O}(d^2)$ and is faster when $k = \Omega(d^2)$. These results are the first global convergence results to demonstrate a provable advantage of a quasi-Newton method over the extragradient method, without querying the Jacobian of the operator. Unlike classical quasi-Newton methods, we achieve this by using the hybrid proximal extragradient framework and a novel online learning approach for updating the Jacobian approximation matrices. Specifically, guided by the convergence analysis, we formulate the Jacobian approximation update as an online convex optimization problem over non-symmetric matrices, relating the regret of the online problem to the convergence rate of our method. To facilitate efficient implementation, we further develop a tailored online learning algorithm based on an approximate separation oracle, which preserves structures such as symmetry and sparsity in the Jacobian matrices.

翻译：本文提出了一种用于求解光滑单调非线性方程组的拟牛顿方法，其中无约束最小化与极小极大优化可作为特例。针对强单调情形，我们建立了两种全局收敛界：(i) 与经典外梯度法相匹配的线性收敛速率；(ii) 显式的全局超线性收敛速率，该速率在至多 ${O}(d)$ 次迭代后可证明超越线性收敛速率，其中 $d$ 为问题维度。此外，对于算子仅满足单调性的情形，我们证明了在对偶间隙意义下 ${O}(\min\{{1}/{k},{\sqrt{d}}/{k^{1.25}}\})$ 的全局收敛速率。当 $k = {O}(d^2)$ 时该速率与外梯度法一致，而当 $k = \Omega(d^2)$ 时则更快。这些结果是首次在不查询算子雅可比矩阵的前提下，证明拟牛顿方法相对于外梯度法具有可验证优势的全局收敛性结论。与传统拟牛顿方法不同，我们通过采用混合邻近外梯度框架及一种新颖的在线学习方法来更新雅可比近似矩阵实现该优势。具体而言，在收敛性分析的指导下，我们将雅可比近似更新问题形式化为非对称矩阵上的在线凸优化问题，并将在线问题的遗憾值与本方法的收敛速率建立关联。为实现高效计算，我们进一步开发了基于近似分离预言机的定制在线学习算法，该算法能保持雅可比矩阵的对称性与稀疏性等结构特性。

相关内容

拟牛顿法

关注 1

拟牛顿法(Quasi-Newton Methods)是求解非线性优化问题最有效的方法之一，于20世纪50年代由美国Argonne国家实验室的物理学家W. C. Davidon所提出来。Davidon设计的这种算法在当时看来是非线性优化领域最具创造性的发明之一。不久R. Fletcher和M. J. D. Powell证实了这种新的算法远比其他方法快速和可靠，使得非线性优化这门学科在一夜之间突飞猛进。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Query2box: 使用盒嵌入对向量空间中的知识图谱进行推理，Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings

专知会员服务

46+阅读 · 2020年5月11日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日