A new variant of Newton's method for empirical risk minimization is studied, where at each iteration of the optimization algorithm, the gradient and Hessian of the objective function are replaced by robust estimators taken from existing literature on robust mean estimation for multivariate data. After proving a general theorem about the convergence of successive iterates to a small ball around the population-level minimizer, consequences of the theory in generalized linear models are studied when data are generated from Huber's epsilon-contamination model and/or heavytailed distributions. An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed, which may be more appropriate for high-dimensional settings, and conjectures about the convergence of the resulting algorithm are offered. Compared to robust gradient descent, the proposed algorithm enjoys the faster rates of convergence for successive iterates often achieved by second-order algorithms for convex problems, i.e., quadratic convergence in a neighborhood of the optimum, with a stepsize that may be chosen adaptively via backtracking linesearch.
翻译:研究了一种用于经验风险最小化的牛顿法新变体,在该优化算法的每次迭代中,目标函数的梯度和海森矩阵被替换为来自多变量数据稳健均值估计现有文献中的稳健估计量。在证明关于连续迭代收敛到总体最小化器附近一个小球的一般定理后,研究了当数据来自Huber的ε-污染模型和/或重尾分布时,广义线性模型中该理论的应用后果。还提出了一种基于共轭梯度法获取稳健牛顿方向的算法,该算法可能更适合高维环境,并给出了关于所得算法收敛性的猜想。与稳健梯度下降相比,所提算法在凸问题中具有二阶算法通常实现的连续迭代更快收敛速度,即最优值邻域内的二次收敛,且步长可通过回溯线搜索自适应选择。