Finding the optimal size of deep learning models is very actual and of broad impact, especially in energy-saving schemes. Very recently, an unexpected phenomenon, the ``double descent'', has caught the attention of the deep learning community. As the model's size grows, the performance gets first worse, and then goes back to improving. It raises serious questions about the optimal model's size to maintain high generalization: the model needs to be sufficiently over-parametrized, but adding too many parameters wastes training resources. Is it possible to find, in an efficient way, the best trade-off? Our work shows that the double descent phenomenon is potentially avoidable with proper conditioning of the learning problem, but a final answer is yet to be found. We empirically observe that there is hope to dodge the double descent in complex scenarios with proper regularization, as a simple $\ell_2$ regularization is already positively contributing to such a perspective.
翻译:寻找深度学习模型的最优规模是一个非常现实且具有广泛影响的问题,尤其在节能方案中。近期,一种意外现象——“双重下降”——引起了深度学习社区的关注。随着模型规模增大,其性能首先变差,随后又恢复提升。这引发了关于保持高泛化能力的最优模型规模的严肃问题:模型需要充分过参数化,但添加过多参数又会浪费训练资源。能否以高效方式找到最佳平衡点?我们的研究表明,通过适当调整学习问题的条件,双重下降现象可能得以避免,但最终答案仍有待探索。我们通过经验观察发现,在复杂场景中,通过适当的正则化有望规避双重下降,即使简单的$\ell_2$正则化也已对此前景产生积极贡献。