The majority of machine learning methods can be regarded as the minimization of an unavailable risk function. To optimize the latter, given samples provided in a streaming fashion, we define a general stochastic Newton algorithm and its weighted average version. In several use cases, both implementations will be shown not to require the inversion of a Hessian estimate at each iteration, but a direct update of the estimate of the inverse Hessian instead will be favored. This generalizes a trick introduced in [2] for the specific case of logistic regression, by directly updating the estimate of the inverse Hessian. Under mild assumptions such as local strong convexity at the optimum, we establish almost sure convergences and rates of convergence of the algorithms, as well as central limit theorems for the constructed parameter estimates. The unified framework considered in this paper covers the case of linear, logistic or softmax regressions to name a few. Numerical experiments on simulated data give the empirical evidence of the pertinence of the proposed methods, which outperform popular competitors particularly in case of bad initializa-tions.
翻译:大多数机器学习方法可视为对不可获得的风险函数进行最小化。为优化后者,基于流式提供的样本,我们定义了一种通用随机牛顿算法及其加权平均版本。在多个应用场景中,这两种实现将被证明无需在每次迭代中求解黑塞矩阵估计的逆,而是直接更新逆黑塞矩阵估计,这一优势得以保留。这推广了文献[2]中针对逻辑回归特例引入的技巧,即通过直接更新逆黑塞矩阵估计实现。在最优解附近满足局部强凸性等温和假设下,我们建立了算法的几乎必然收敛性及收敛速率,并给出了参数估计的中心极限定理。本文提出的统一框架涵盖线性回归、逻辑回归及softmax回归等多种情形。基于模拟数据的数值实验为所提方法的相关性提供了经验证据,尤其在初始化较差的情况下,该方法显著优于主流竞争者。