We exploit a formal correspondence between thermodynamics and inference, where the number of samples can be thought of as the inverse temperature, to define a "learning capacity'' which is a measure of the effective dimensionality of a model. We show that the learning capacity is a tiny fraction of the number of parameters for many deep networks trained on typical datasets, depends upon the number of samples used for training, and is numerically consistent with notions of capacity obtained from the PAC-Bayesian framework. The test error as a function of the learning capacity does not exhibit double descent. We show that the learning capacity of a model saturates at very small and very large sample sizes; this provides guidelines, as to whether one should procure more data or whether one should search for new architectures, to improve performance. We show how the learning capacity can be used to understand the effective dimensionality, even for non-parametric models such as random forests and $k$-nearest neighbor classifiers.
翻译:我们利用热力学与推理之间的形式对应关系(其中样本数量可视为逆温度),定义了一种称为“学习能力”的指标,用于衡量模型的有效维度。研究表明,在典型数据集上训练的许多深度网络中,学习能力仅占参数数量的极小部分,其大小依赖于训练样本数量,并且在数值上与PAC-Bayesian框架中的容量概念一致。测试误差作为学习能力的函数并未呈现双重下降现象。我们发现,模型的学习能力在极小和极大的样本量下趋于饱和;这为指导是应该收集更多数据还是寻找新架构以提升性能提供了依据。我们进一步展示了如何将学习能力用于理解有效维度,即使对于随机森林和$k$-近邻分类器这类非参数模型也同样适用。