We study the widely known Cubic-Newton method in the stochastic setting and propose a general framework to use variance reduction which we call the helper framework. In all previous work, these methods were proposed with very large batches (both in gradients and Hessians) and with various and often strong assumptions. In this work, we investigate the possibility of using such methods without large batches and use very simple assumptions that are sufficient for all our methods to work. In addition, we study these methods applied to gradient-dominated functions. In the general case, we show improved convergence (compared to first-order methods) to an approximate local minimum, and for gradient-dominated functions, we show convergence to approximate global minima.
翻译:我们研究了广泛知名的立方牛顿方法在随机环境下的应用,并提出了一种通用的方差缩减框架,称为辅助框架。在以往的所有工作中,这些方法都采用了非常大的批量(包括梯度和海森矩阵),并基于多种且通常很强的假设。在本文中,我们探讨了在没有大批量条件下使用此类方法的可能性,并采用了非常简单的假设,这些假设足以支持我们所有方法的工作。此外,我们还研究将这些方法应用于梯度主导函数的情况。在一般情况下,我们证明了相较于一阶方法,这些方法能够改改进收敛至近似局部极小值;而对于梯度主导函数,我们则展示了其收敛至近似全局极小值的能力。