A core principle in statistical learning is that smoothness of target functions allows to break the curse of dimensionality. However, learning a smooth function seems to require enough samples close to one another to get meaningful estimate of high-order derivatives, which would be hard in machine learning problems where the ratio between number of data and input dimension is relatively small. By deriving new lower bounds on the generalization error, this paper formalizes such an intuition, before investigating the role of constants and transitory regimes which are usually not depicted beyond classical learning theory statements while they play a dominant role in practice.
翻译:统计学习的核心原则之一是目标函数的光滑性可以打破维数灾难。然而,学习光滑函数似乎需要足够接近的样本,以获取高阶导数的有意义估计,这在数据量与输入维度之比较小的机器学习问题中难以实现。通过推导泛化误差的新下界,本文形式化了这一直觉,进而研究了常数和过渡阶段的作用——这些因素通常在经典学习理论论断之外未被描述,但在实践中却占据主导地位。