Neoteric works have shown that modern deep learning models can exhibit a sparse double descent phenomenon. Indeed, as the sparsity of the model increases, the test performance first worsens since the model is overfitting the training data; then, the overfitting reduces, leading to an improvement in performance, and finally, the model begins to forget critical information, resulting in underfitting. Such a behavior prevents using traditional early stop criteria. In this work, we have three key contributions. First, we propose a learning framework that avoids such a phenomenon and improves generalization. Second, we introduce an entropy measure providing more insights into the insurgence of this phenomenon and enabling the use of traditional stop criteria. Third, we provide a comprehensive quantitative analysis of contingent factors such as re-initialization methods, model width and depth, and dataset noise. The contributions are supported by empirical evidence in typical setups. Our code is available at https://github.com/VGCQ/DSD2.
翻译:摘要:最新研究表明,现代深度学习模型可能表现出稀疏双下降现象。具体而言,随着模型稀疏度增加,测试性能首先因模型过拟合训练数据而恶化;随后过拟合减少,性能出现改善;最终模型开始遗忘关键信息,导致欠拟合。此类行为使得传统早停准则无法使用。本文有三项核心贡献:首先,我们提出一个能避免该现象并提升泛化能力的学习框架;其次,引入一种熵度量方法,为该现象的触发机制提供更深入见解,并使得传统停准则得以应用;第三,我们对再初始化方法、模型宽度与深度、数据集噪声等关联因素进行了全面的定量分析。上述贡献在典型实验设置下均得到实证支持。相关代码发布于 https://github.com/VGCQ/DSD2。