We analyze the convergence of a nonlocal gradient descent method for minimizing a class of high-dimensional non-convex functions, where a directional Gaussian smoothing (DGS) is proposed to define the nonlocal gradient (also referred to as the DGS gradient). The method was first proposed in [42], in which multiple numerical experiments showed that replacing the traditional local gradient with the DGS gradient can help the optimizers escape local minima more easily and significantly improve their performance. However, a rigorous theory for the efficiency of the method on nonconvex landscape is lacking. In this work, we investigate the scenario where the objective function is composed of a convex function, perturbed by a oscillating noise. We provide a convergence theory under which the iterates exponentially converge to a tightened neighborhood of the solution, whose size is characterized by the noise wavelength. We also establish a correlation between the optimal values of the Gaussian smoothing radius and the noise wavelength, thus justify the advantage of using moderate or large smoothing radius with the method. Furthermore, if the noise level decays to zero when approaching global minimum, we prove that DGS-based optimization converges to the exact global minimum with linear rates, similarly to standard gradient-based method in optimizing convex functions. Several numerical experiments are provided to confirm our theory and illustrate the superiority of the approach over those based on the local gradient.
翻译:本文分析了一类用于最小化高维非凸函数集合的非局部梯度下降法的收敛性,其中提出了方向高斯平滑(DGS)来定义非局部梯度(也称为DGS梯度)。该方法最初在文献[42]中提出,其中多个数值实验表明,用DGS梯度取代传统的局部梯度可以帮助优化器更容易地逃离局部最小值,并显著提高其性能。然而,关于该方法在非凸景观上有效性的严格理论尚缺乏。在本工作中,我们研究了目标函数由一个凸函数和振荡噪声扰动构成的情形。我们提供了一个收敛理论,在该理论下迭代指数收敛到解的一个收紧邻域,该邻域的大小由噪声波长表征。我们还建立了高斯平滑半径最优值与噪声波长之间的相关性,从而证明了使用中等或较大平滑半径方法的优势。此外,如果噪声水平在接近全局最小值时衰减为零,我们证明了基于DGS的优化以线性速率收敛到精确的全局最小值,类似于标准梯度方法在优化凸函数时的表现。我们提供了多个数值实验来证实我们的理论,并阐明该方法相比基于局部梯度方法的优越性。