To minimize the average of a set of log-convex functions, the stochastic Newton method iteratively updates its estimate using subsampled versions of the full objective's gradient and Hessian. We contextualize this optimization problem as sequential Bayesian inference on a latent state-space model with a discriminatively-specified observation process. Applying Bayesian filtering then yields a novel optimization algorithm that considers the entire history of gradients and Hessians when forming an update. We establish matrix-based conditions under which the effect of older observations diminishes over time, in a manner analogous to Polyak's heavy ball momentum. We illustrate various aspects of our approach with an example and review other relevant innovations for the stochastic Newton method.
翻译:为最小化一组对数凸函数的平均值,随机牛顿法利用完整目标函数梯度与海森矩阵的子采样版本迭代更新估计值。我们将该优化问题置于贝叶斯序贯推断框架下,所构建的隐状态空间模型包含判别式指定的观测过程。通过应用贝叶斯滤波,我们得到一种新型优化算法,该算法在更新时考虑全部历史梯度与海森矩阵信息。我们建立了矩阵条件,在此条件下旧观测值的影响随时间衰减,该现象类似于Polyak的动量加速法。通过实例展示了本方法的多种特性,并综述了随机牛顿法的其他相关改进。