Understanding how the test risk scales with model complexity is a central question in machine learning. Classical theory is challenged by the learning curves observed for large over-parametrized deep networks. Capacity measures based on parameter count typically fail to account for these empirical observations. To tackle this challenge, we consider norm-based capacity measures and develop our study for random features based estimators, widely used as simplified theoretical models for more complex networks. In this context, we provide a precise characterization of how the estimator's norm concentrates and how it governs the associated test error. Our results show that the predicted learning curve admits a phase transition from under- to over-parameterization, but no double descent behavior. This confirms that more classical U-shaped behavior is recovered considering appropriate capacity measures based on models norms rather than size. From a technical point of view, we leverage deterministic equivalence as the key tool and further develop new deterministic quantities which are of independent interest.
翻译:理解测试风险如何随模型复杂度变化是机器学习的核心问题。经典理论难以解释大型过参数化深度网络所呈现的学习曲线。基于参数数量的容量度量通常无法解释这些经验观察结果。为应对这一挑战,我们考虑基于范数的容量度量,并以随机特征估计器(广泛用作复杂网络的简化理论模型)为研究对象展开分析。在此框架下,我们精确刻画了估计器范数的集中特性及其对测试误差的调控机制。研究结果表明:预测学习曲线呈现从欠参数化到过参数化的相变现象,但未出现双下降行为。这证实了基于模型范数(而非规模)的恰当容量度量能够恢复更经典的U形行为。从技术角度看,我们以确定性等价为核心工具,并进一步构建了具有独立价值的新型确定性量。