A core principle in statistical learning is that smoothness of target functions allows to break the curse of dimensionality. However, learning a smooth function seems to require enough samples close to one another to get meaningful estimate of high-order derivatives, which would be hard in machine learning problems where the ratio between number of data and input dimension is relatively small. By deriving new lower bounds on the generalization error, this paper formalizes such an intuition, before investigating the role of constants and transitory regimes which are usually not depicted beyond classical learning theory statements while they play a dominant role in practice.
翻译:统计学习的一个核心原则是目标函数的平滑性能够打破维度灾难。然而,要学习一个平滑函数,似乎需要足够多的邻近样本以获得对高阶导数的有意义估计,这在数据量与输入维度之比较小的机器学习问题中难以实现。通过推导泛化误差的新下界,本文形式化了这一直觉,进而研究了常数和过渡状态的作用——这些因素在经典学习理论论述中通常不被描述,却在实践中占据主导地位。