Lipschitz continuity is a simple yet crucial functional property of any predictive model for it lies at the core of the model's robustness, generalisation, as well as adversarial vulnerability. Our aim is to thoroughly investigate and characterise the Lipschitz behaviour of the functions realised by neural networks. Thus, we carry out an empirical investigation in a range of different settings (namely, architectures, losses, optimisers, label noise, and more) by exhausting the limits of the simplest and the most general lower and upper bounds. Although motivated primarily by computational hardness results, this choice nevertheless turns out to be rather resourceful and sheds light on several fundamental and intriguing traits of the Lipschitz continuity of neural network functions, which we also supplement with suitable theoretical arguments. As a highlight of this investigation, we identify a striking double descent trend in both upper and lower bounds to the Lipschitz constant with increasing network width -- which tightly aligns with the typical double descent trend in the test loss. Lastly, we touch upon the seeming (counter-intuitive) decline of the Lipschitz constant in the presence of label noise.
翻译:Lipschitz连续性是一种简单但至关重要的预测模型函数性质,因为它影响着模型的鲁棒性、泛化能力以及对抗性脆弱性。我们的目标是深入研究和刻画神经网络所实现函数的Lipschitz行为。为此,我们通过穷尽最简单且最通用的上界和下界,在一系列不同设置(包括架构、损失函数、优化器、标签噪声等)中进行了实证研究。虽然这一选择主要受计算复杂性难题驱动,但事实证明它颇具启发意义,揭示了神经网络函数Lipschitz连续性的若干基本且有趣的特征,并辅以相应的理论论证。作为本研究的亮点,我们观察到随着网络宽度增加,Lipschitz常数的上下界均呈现出显著的双重下降趋势——这与测试损失中典型的双重下降趋势高度吻合。最后,我们还初步探讨了标签噪声存在时Lipschitz常数看似反直觉的下降现象。