Variational Learning (VL) has recently gained popularity for training deep neural networks. Part of its empirical success can be explained by theories such as PAC-Bayes bounds, minimum description length and marginal likelihood, but little has been done to unravel the implicit regularization in play. Here, we analyze the implicit regularization of VL through the Edge of Stability (EoS) framework. EoS has previously been used to show that gradient descent can find flat solutions and we extend this result to show that VL can find even flatter solutions. This result is obtained by controlling the shape of the variational posterior as well as the number of posterior samples used during training. The derivation follows in a similar fashion as in the standard EoS literature for deep learning, by first deriving a result for a quadratic problem and then extending it to deep neural networks. We empirically validate these findings on a wide variety of large networks, such as ResNet and ViT, to find that the theoretical results closely match the empirical ones. Ours is the first work to analyze the EoS dynamics of VL.
翻译:变分学习(VL)近年来在训练深度神经网络中广受欢迎。其部分实证成功可通过PAC-Bayes边界、最小描述长度和边缘似然等理论解释,但对其隐含正则化机制的研究尚不充分。本文通过稳定性边缘(EoS)框架分析VL的隐含正则化效应。EoS先前已被用于证明梯度下降可寻得平坦解,我们扩展该结论并证明VL能寻得更平坦的解。这一结果通过控制变分后验分布的形状及训练过程中使用的后验采样数量实现。推导过程遵循深度学习标准EoS文献的范式:先推导二次问题的结论,再扩展至深度神经网络。我们在ResNet、ViT等多种大型网络上进行实证验证,发现理论结果与实证数据高度吻合。本研究首次系统分析了VL的EoS动态特性。