Combining empirical risk minimization with capacity control is a classical strategy in machine learning when trying to control the generalization gap and avoid overfitting, as the model class capacity gets larger. Yet, in modern deep learning practice, very large over-parameterized models (e.g. neural networks) are optimized to fit perfectly the training data and still obtain great generalization performance. Past the interpolation point, increasing model complexity seems to actually lower the test error. In this tutorial, we explain the concept of double descent and its mechanisms. The first section sets the classical statistical learning framework and introduces the double descent phenomenon. By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting, among the multiple interpolating solutions, a smooth empirical risk minimizer. Finally, section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
翻译:将经验风险最小化与容量控制相结合,是机器学习中控制泛化差距并避免过拟合的经典策略,其前提是模型类别容量不断增大。然而,在当代深度学习实践中,超大规模过参数化模型(如神经网络)通常在完美拟合训练数据的同时,仍能获得优异的泛化性能。当模型复杂度超越插值临界点后,测试误差反而会随模型复杂度的提升而下降。本教程系统阐述双重下降现象及其内在机制:第一部分建立经典统计学习框架,引入双重下降现象;第二部分通过多个实例揭示,在众多插值解中,选择平滑经验风险最小化器的归纳偏置对双重下降现象起关键作用;第三部分以两种线性模型为例探究双重下降现象,并给出近期相关研究的其他视角。