Sharpened Lazy Incremental Quasi-Newton Method

The problem of minimizing the sum of $n$ functions in $d$ dimensions is ubiquitous in machine learning and statistics. In many applications where the number of observations $n$ is large, it is necessary to use incremental or stochastic methods, as their per-iteration cost is independent of $n$. Of these, Quasi-Newton (QN) methods strike a balance between the per-iteration cost and the convergence rate. Specifically, they exhibit a superlinear rate with $O(d^2)$ cost in contrast to the linear rate of first-order methods with $O(d)$ cost and the quadratic rate of second-order methods with $O(d^3)$ cost. However, existing incremental methods have notable shortcomings: Incremental Quasi-Newton (IQN) only exhibits asymptotic superlinear convergence. In contrast, Incremental Greedy BFGS (IGS) offers explicit superlinear convergence but suffers from poor empirical performance and has a per-iteration cost of $O(d^3)$. To address these issues, we introduce the Sharpened Lazy Incremental Quasi-Newton Method (SLIQN) that achieves the best of both worlds: an explicit superlinear convergence rate, and superior empirical performance at a per-iteration $O(d^2)$ cost. SLIQN features two key changes: first, it incorporates a hybrid strategy of using both classic and greedy BFGS updates, allowing it to empirically outperform both IQN and IGS. Second, it employs a clever constant multiplicative factor along with a lazy propagation strategy, which enables it to have a cost of $O(d^2)$. Additionally, our experiments demonstrate the superiority of SLIQN over other incremental and stochastic Quasi-Newton variants and establish its competitiveness with second-order incremental methods.

翻译：最小化$d$维空间中$n$个函数之和的问题在机器学习和统计学中普遍存在。在许多观测数$n$较大的应用中，必须采用增量或随机方法，因为其每次迭代的计算成本与$n$无关。在这些方法中，拟牛顿（QN）方法在每次迭代成本与收敛速度之间取得了平衡。具体而言，与一阶方法的线性收敛速度（成本为$O(d)$）和二阶方法的二次收敛速度（成本为$O(d^3)$）相比，拟牛顿方法以$O(d^2)$的成本实现了超线性收敛速度。然而，现有增量方法存在显著缺陷：增量拟牛顿（IQN）仅具有渐近超线性收敛性；而增量贪心BFGS（IGS）虽具有显式超线性收敛性，但经验性能较差且每次迭代成本为$O(d^3)$。为解决这些问题，我们提出了锐化懒惰增量拟牛顿法（SLIQN），该方法实现了两全其美：具有显式超线性收敛速度，同时以每次迭代$O(d^2)$的成本展现出卓越的经验性能。SLIQN包含两项关键改进：首先，它采用了结合经典BFGS更新与贪心BFGS更新的混合策略，使其在经验上优于IQN和IGS。其次，它利用巧妙的常数乘数因子与惰性传播策略，从而将成本降低至$O(d^2)$。此外，我们的实验证明了SLIQN相对于其他增量和随机拟牛顿变体的优越性，并确立了其与二阶增量方法的竞争力。