In this paper, we present a unified and general framework for analyzing the batch updating approach to nonlinear, high-dimensional optimization. The framework encompasses all the currently used batch updating approaches, and is applicable to nonconvex as well as convex functions. Moreover, the framework permits the use of noise-corrupted gradients, as well as first-order approximations to the gradient (sometimes referred to as "gradient-free" approaches). By viewing the analysis of the iterations as a problem in the convergence of stochastic processes, we are able to establish a very general theorem, which includes most known convergence results for zeroth-order and first-order methods. The analysis of "second-order" or momentum-based methods is not a part of this paper, and will be studied elsewhere. However, numerical experiments indicate that momentum-based methods can fail if the true gradient is replaced by its first-order approximation. This requires further theoretical analysis.
翻译:本文提出了一个统一且通用的框架,用于分析非线性高维优化中的批处理方法。该框架涵盖了当前所有使用的批处理技术,并适用于非凸函数与凸函数。此外,框架允许使用受噪声干扰的梯度以及梯度的近似一阶函数(有时称为“无梯度”方法)。通过将迭代分析转化为随机过程收敛问题,我们建立了一个高度通用的定理,该定理涵盖了零阶和一阶方法的大多数已知收敛结果。对“二阶”或基于动量方法的分析不在本文讨论范围内,将另行研究。然而,数值实验表明,若用梯度的一阶近似替换真实梯度,基于动量的方法可能失效,这需要进一步的理论分析。