Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time. Recent work in [arXiv:2204.09266] addressed this with a Hessian averaging scheme that achieves superlinear convergence without higher per-iteration costs. Nonetheless, the method has slow global convergence, requiring up to $\tilde{O}(\kappa^2)$ iterations to reach the superlinear rate of $\tilde{O}((1/t)^{t/2})$, where $\kappa$ is the problem's condition number. In this paper, we propose a novel stochastic Newton proximal extragradient method that improves these bounds, achieving a faster global linear rate and reaching the same fast superlinear rate in $\tilde{O}(\kappa)$ iterations. We accomplish this by extending the Hybrid Proximal Extragradient (HPE) framework, achieving fast global and local convergence rates for strongly convex functions with access to a noisy Hessian oracle.
翻译:随机二阶方法通过使用噪声海森估计对梯度进行预处理,在强凸优化中实现了快速的局部收敛。然而,这些方法通常仅在随机海森噪声减小时才能达到超线性收敛,且每次迭代成本随时间增加。近期工作[arXiv:2204.09266]通过一种海森平均方案解决了此问题,该方案在不增加每次迭代成本的情况下实现了超线性收敛。尽管如此,该方法全局收敛较慢,需要多达 $\tilde{O}(\kappa^2)$ 次迭代才能达到 $\tilde{O}((1/t)^{t/2})$ 的超线性收敛速率,其中 $\kappa$ 为问题的条件数。本文提出了一种新颖的随机牛顿邻近外梯度法,改进了这些界,实现了更快的全局线性收敛速率,并在 $\tilde{O}(\kappa)$ 次迭代内达到同样快速的超线性收敛速率。我们通过扩展混合邻近外梯度(HPE)框架来实现这一目标,为能够访问噪声海森预言机的强凸函数获得了快速的全局与局部收敛速率。