Given a fixed budget for total model size, one must choose between training a single large model or combining the predictions of multiple smaller models. We investigate this trade-off for ensembles of random-feature ridge regression models in both the overparameterized and underparameterized regimes. Using deterministic equivalent risk estimates, we prove that when a fixed number of parameters is distributed among $K$ independently trained models, the ridge-optimized test risk increases with $K$. Consequently, a single large model achieves optimal performance. We then ask when ensembles can achieve \textit{near}-optimal performance. In the overparameterized regime, we show that, to leading order, the test error depends on ensemble size and model size only through the total feature count, so that overparameterized ensembles consistently achieve near-optimal performance. To understand underparameterized ensembles, we derive scaling laws for the test risk as a function of total parameter count when the ensemble size and parameters per ensemble member are jointly scaled according to a ``growth exponent'' $\ell$. While the optimal error scaling is always achieved by increasing model size with a fixed ensemble size, our analysis identifies conditions on the kernel and task eigenstructure under which near-optimal scaling laws can be obtained by joint scaling of ensemble size and model size.
翻译:在给定模型总参数量固定预算的情况下,研究者需在训练单个大型模型与组合多个小型模型预测之间做出选择。本文针对过参数化与欠参数化两种机制下的随机特征岭回归模型集成,深入探讨了这一权衡问题。通过确定性等价风险估计,我们证明当固定数量的参数被分配至K个独立训练的模型时,经岭回归优化的测试风险随K值增加而上升,因此单个大型模型能够实现最优性能。随后我们探究集成模型何时能够达到近最优性能。在过参数化机制中,我们证明测试误差的主导项仅通过总特征数量依赖于集成规模与模型规模,使得过参数化集成始终能实现近最优性能。为理解欠参数化集成,我们推导了测试风险随总参数量变化的缩放定律,其中集成规模与各集成成员参数量根据“增长指数”ℓ进行联合缩放。虽然通过固定集成规模并增加模型规模总能实现最优误差缩放,但我们的分析识别出了核函数与任务本征结构在某些条件下,通过联合缩放集成规模与模型规模可获得近最优缩放定律。