Recent years have seen many insights on deep learning optimisation being brought forward by finding implicit regularisation effects of commonly used gradient-based optimisers. Understanding implicit regularisation can not only shed light on optimisation dynamics, but it can also be used to improve performance and stability across problem domains, from supervised learning to two-player games such as Generative Adversarial Networks. An avenue for finding such implicit regularisation effects has been quantifying the discretisation errors of discrete optimisers via continuous-time flows constructed by backward error analysis (BEA). The current usage of BEA is not without limitations, since not all the vector fields of continuous-time flows obtained using BEA can be written as a gradient, hindering the construction of modified losses revealing implicit regularisers. In this work, we provide a novel approach to use BEA, and show how our approach can be used to construct continuous-time flows with vector fields that can be written as gradients. We then use this to find previously unknown implicit regularisation effects, such as those induced by multiple stochastic gradient descent steps while accounting for the exact data batches used in the updates, and in generally differentiable two-player games.
翻译:近年来,通过揭示常用基于梯度的优化器的隐式正则化效应,人们获得了对深度学习优化的诸多深刻见解。理解隐式正则化不仅能阐明优化动态,还能提升问题领域(从监督学习到生成对抗网络等双人博弈)的性能与稳定性。量化隐式正则化效应的一种途径,是通过反向误差分析构建连续时间流来刻画离散优化器的离散化误差。当前反向误差分析的使用存在局限性,因为通过反向误差分析获得的连续时间流向量场并非都能表达为梯度形式,这阻碍了揭示隐式正则化器修正损失的构建。本研究提出了一种利用反向误差分析的新方法,并展示了如何用该方法构建向量场可表达为梯度的连续时间流。我们进而利用该方法发现了此前未知的隐式正则化效应,例如在考虑更新所用确切数据批次时,由多步随机梯度下降在一般可微双人博弈中诱导的隐式正则化效应。