Training deep neural networks (DNNs) is an important and challenging optimization problem in machine learning due to its non-convexity and non-separable structure. The alternating minimization (AM) approaches split the composition structure of DNNs and have drawn great interest in the deep learning and optimization communities. In this paper, we propose a unified framework for analyzing the convergence rate of AM-type network training methods. Our analysis is based on the non-monotone $j$-step sufficient decrease conditions and the Kurdyka-Lojasiewicz (KL) property, which relaxes the requirement of designing descent algorithms. We show the detailed local convergence rate if the KL exponent $\theta$ varies in $[0,1)$. Moreover, the local R-linear convergence is discussed under a stronger $j$-step sufficient decrease condition.
翻译:训练深度神经网络(DNNs)是机器学习中一个重要且具有挑战性的优化问题,其难点在于非凸性和非可分结构。交替最小化(AM)方法通过分解DNNs的复合结构,引起了深度学习和优化领域的广泛关注。本文提出一个统一框架,用于分析AM型网络训练方法的收敛速度。我们的分析基于非单调$j$步充分下降条件和Kurdyka-Lojasiewicz(KL)性质,该分析松弛了对设计下降算法的要求。我们展示了当KL指数$\theta$在$[0,1)$范围内变化时详细的局部收敛速度。此外,在更强的$j$步充分下降条件下,还讨论了局部R线性收敛性。