A scaling law refers to the observation that the test performance of a model improves as the number of training data increases. A fast scaling law implies that one can solve machine learning problems by simply boosting the data and the model sizes. Yet, in many cases, the benefit of adding more data can be negligible. In this work, we study the rate of scaling laws of nearest neighbor classifiers. We show that a scaling law can have two phases: in the first phase, the generalization error depends polynomially on the data dimension and decreases fast; whereas in the second phase, the error depends exponentially on the data dimension and decreases slowly. Our analysis highlights the complexity of the data distribution in determining the generalization error. When the data distributes benignly, our result suggests that nearest neighbor classifier can achieve a generalization error that depends polynomially, instead of exponentially, on the data dimension.
翻译:缩放律是指模型测试性能随训练数据量增加而提升的观察现象。快速缩放律表明,仅通过增加数据量和模型规模即可解决机器学习问题。然而在许多情况下,增加更多数据的收益可能微乎其微。本研究探讨了最近邻分类器缩放律的速率。我们发现缩放律存在两个阶段:第一阶段中,泛化误差与数据维度呈多项式关系且快速下降;第二阶段中,误差与数据维度呈指数关系且下降缓慢。我们的分析凸显了数据分布复杂度对泛化误差的决定性作用。当数据分布良性时,研究结果表明最近邻分类器能实现与数据维度呈多项式(而非指数)关系的泛化误差。