In this work, we establish the linear convergence estimate for the gradient descent involving the delay $\tau\in\mathbb{N}$ when the cost function is $\mu$-strongly convex and $L$-smooth. This result improves upon the well-known estimates in Arjevani et al. \cite{ASS} and Stich-Karmireddy \cite{SK} in the sense that it is non-ergodic and is still established in spite of weaker constraint of cost function. Also, the range of learning rate $\eta$ can be extended from $\eta\leq 1/(10L\tau)$ to $\eta\leq 1/(4L\tau)$ for $\tau =1$ and $\eta\leq 3/(10L\tau)$ for $\tau \geq 2$, where $L >0$ is the Lipschitz continuity constant of the gradient of cost function. In a further research, we show the linear convergence of cost function under the Polyak-{\L}ojasiewicz\,(PL) condition, for which the available choice of learning rate is further improved as $\eta\leq 9/(10L\tau)$ for the large delay $\tau$. Finally, some numerical experiments are provided in order to confirm the reliability of the analyzed results.
翻译:本文针对代价函数为μ-强凸且L-光滑的情形,建立了包含延迟τ∈ℕ的梯度下降法的线性收敛估计。该结果相较于Arjevani等人\cite{ASS}和Stich-Karmireddy\cite{SK}的经典估计,在非遍历性及代价函数约束条件更弱的情况下仍能成立。此外,学习率η的取值范围可从η≤1/(10Lτ)扩展为:当τ=1时η≤1/(4Lτ),当τ≥2时η≤3/(10Lτ),其中L>0为代价函数梯度的Lipschitz连续性常数。进一步研究表明,在Polyak-Łojasiewicz(PL)条件下,代价函数具有线性收敛性,且当延迟τ较大时,学习率的选择可进一步改进为η≤9/(10Lτ)。最后,通过数值实验验证了所分析结果的可靠性。