We present a new class of Langevin based algorithms, which overcomes many of the known shortcomings of popular adaptive optimizers that are currently used for the fine tuning of deep learning models. Its underpinning theory relies on recent advances of Euler's polygonal approximations for stochastic differential equations (SDEs) with monotone coefficients. As a result, it inherits the stability properties of tamed algorithms, while it addresses other known issues, e.g. vanishing gradients in neural networks. In particular, we provide a nonasymptotic analysis and full theoretical guarantees for the convergence properties of an algorithm of this novel class, which we named TH$\varepsilon$O POULA (or, simply, TheoPouLa). Finally, several experiments are presented with different types of deep learning models, which show the superior performance of TheoPouLa over many popular adaptive optimization algorithms.
翻译:我们提出了一类新的基于朗之万的算法,克服了当前用于深度学习模型微调的流行自适应优化器存在的诸多已知缺陷。其理论基础依赖于近年来针对具有单调系数的随机微分方程(SDEs)的欧拉多边形逼近方法的研究进展。因此,该类算法继承了驯化算法的稳定性特性,同时解决了其他已知问题,例如神经网络中的梯度消失。特别地,我们对该新类别中的一种算法(命名为TH$\varepsilon$O POULA,简称TheoPouLa)进行了非渐近分析,并提供了其收敛性质的完整理论保证。最后,我们针对不同类型的深度学习模型进行了多项实验,结果表明TheoPouLa相较于许多流行的自适应优化算法具有更优的性能。