Sharpened Lazy Incremental Quasi-Newton Method

The problem of minimizing the sum of $n$ functions in $d$ dimensions is ubiquitous in machine learning and statistics. In many applications where the number of observations $n$ is large, it is necessary to use incremental or stochastic methods, as their per-iteration cost is independent of $n$. Of these, Quasi-Newton (QN) methods strike a balance between the per-iteration cost and the convergence rate. Specifically, they exhibit a superlinear rate with $O(d^2)$ cost in contrast to the linear rate of first-order methods with $O(d)$ cost and the quadratic rate of second-order methods with $O(d^3)$ cost. However, existing incremental methods have notable shortcomings: Incremental Quasi-Newton (IQN) only exhibits asymptotic superlinear convergence. In contrast, Incremental Greedy BFGS (IGS) offers explicit superlinear convergence but suffers from poor empirical performance and has a per-iteration cost of $O(d^3)$. To address these issues, we introduce the Sharpened Lazy Incremental Quasi-Newton Method (SLIQN) that achieves the best of both worlds: an explicit superlinear convergence rate, and superior empirical performance at a per-iteration $O(d^2)$ cost. SLIQN features two key changes: first, it incorporates a hybrid strategy of using both classic and greedy BFGS updates, allowing it to empirically outperform both IQN and IGS. Second, it employs a clever constant multiplicative factor along with a lazy propagation strategy, which enables it to have a cost of $O(d^2)$. Additionally, our experiments demonstrate the superiority of SLIQN over other incremental and stochastic Quasi-Newton variants and establish its competitiveness with second-order incremental methods.

翻译：在机器学习与统计学中，最小化 $n$ 个 $d$ 维函数之和的问题普遍存在。在许多观测数 $n$ 较大的应用中，由于增量法或随机法的每次迭代成本与 $n$ 无关，因此必须采用这些方法。其中，拟牛顿（QN）方法在每次迭代成本与收敛速度之间取得了平衡。具体而言，相比一阶方法的线性收敛速度（$O(d)$ 成本）和二阶方法的二次收敛速度（$O(d^3)$ 成本），QN 方法以 $O(d^2)$ 成本实现超线性收敛速度。然而，现有增量方法存在显著缺陷：增量拟牛顿（IQN）仅呈现渐近超线性收敛，而增量贪婪BFGS（IGS）虽提供显式超线性收敛，但经验性能较差且每次迭代成本为 $O(d^3)$。为解决这些问题，我们提出锐化懒惰增量拟牛顿方法（SLIQN），该方法兼具两者优势：显式超线性收敛速度，以及每次迭代成本 $O(d^2)$ 下的卓越经验性能。SLIQN 包含两项关键改进：首先，它融合了经典与贪婪 BFGS 更新的混合策略，使其经验性能优于 IQN 和 IGS；其次，它采用巧妙的常数乘性因子配合懒惰传播策略，从而将成本降至 $O(d^2)$。此外，我们的实验证明了 SLIQN 相较于其他增量式和随机式拟牛顿变体的优越性，并确立了其与二阶增量方法的竞争力。