This paper addresses the optimization problem of minimizing non-convex continuous functions, which is relevant in the context of high-dimensional machine learning applications characterized by over-parametrization. We analyze a randomized coordinate second-order method named SSCN which can be interpreted as applying cubic regularization in random subspaces. This approach effectively reduces the computational complexity associated with utilizing second-order information, rendering it applicable in higher-dimensional scenarios. Theoretically, we establish convergence guarantees for non-convex functions, with interpolating rates for arbitrary subspace sizes and allowing inexact curvature estimation. When increasing subspace size, our complexity matches $\mathcal{O}(\epsilon^{-3/2})$ of the cubic regularization (CR) rate. Additionally, we propose an adaptive sampling scheme ensuring exact convergence rate of $\mathcal{O}(\epsilon^{-3/2}, \epsilon^{-3})$ to a second-order stationary point, even without sampling all coordinates. Experimental results demonstrate substantial speed-ups achieved by SSCN compared to conventional first-order methods.
翻译:本文研究了非凸连续函数的最小化优化问题,该问题在具有过参数化特征的高维机器学习应用中具有重要意义。我们分析了一种名为SSCN的随机坐标二阶方法,可将其理解为在随机子空间中应用立方正则化。该方法有效降低了利用二阶信息带来的计算复杂度,使其适用于更高维度的场景。理论上,我们为非凸函数建立了收敛保证,其中包含任意子空间尺寸的插值收敛速率,并允许不精确的曲率估计。当增加子空间尺寸时,我们的复杂度达到了立方正则化(CR)速率的$\mathcal{O}(\epsilon^{-3/2})$。此外,我们提出了一种自适应采样方案,即使不采样所有坐标,也能确保以$\mathcal{O}(\epsilon^{-3/2}, \epsilon^{-3})$的精确收敛速率达到二阶稳定点。实验结果表明,与传统一阶方法相比,SSCN实现了显著的加速效果。