This paper presents a theoretical analysis of linear interpolation as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear interpolation can help by leveraging the theory of nonexpansive operators. We construct a new optimization scheme called relaxed approximate proximal point (RAPP), which is the first explicit method without anchoring to achieve last iterate convergence rates for $\rho$-comonotone problems while only requiring $\rho > -\tfrac{1}{2L}$. The construction extends to constrained and regularized settings. By replacing the inner optimizer in RAPP we rediscover the family of Lookahead algorithms for which we establish convergence in cohypomonotone problems even when the base optimizer is taken to be gradient descent ascent. The range of cohypomonotone problems in which Lookahead converges is further expanded by exploiting that Lookahead inherits the properties of the base optimizer. We corroborate the results with experiments on generative adversarial networks which demonstrates the benefits of the linear interpolation present in both RAPP and Lookahead.
翻译:本文对线性插值作为一种稳定(大规模)神经网络训练的原则性方法进行了理论分析。我们认为优化过程中的不稳定性通常由损失景观的非单调性引起,并展示了如何利用非扩张算子理论,通过线性插值加以缓解。我们构建了一种新的优化方案——松弛近似近端点法(RAPP),这是首个无需锚定即可在仅要求ρ>−½L的条件下实现ρ-共单调问题最终迭代收敛速率的显式方法。该构造方法可扩展至约束及正则化场景。通过替换RAPP中的内部优化器,我们重新发现了Lookahead算法族,并证明即使基优化器采用梯度下降上升法,该算法族也能在共谐单调问题上实现收敛。利用Lookahead继承基优化器特性的优势,我们进一步扩展了其可收敛的共谐单调问题范围。我们通过生成对抗网络实验验证了理论结果,实验表明RAPP与Lookahead中共有的线性插值机制具有显著优势。