Model-agnostic Measure of Generalization Difficulty

The measure of a machine learning algorithm is the difficulty of the tasks it can perform, and sufficiently difficult tasks are critical drivers of strong machine learning models. However, quantifying the generalization difficulty of machine learning benchmarks has remained challenging. We propose what is to our knowledge the first model-agnostic measure of the inherent generalization difficulty of tasks. Our inductive bias complexity measure quantifies the total information required to generalize well on a task minus the information provided by the data. It does so by measuring the fractional volume occupied by hypotheses that generalize on a task given that they fit the training data. It scales exponentially with the intrinsic dimensionality of the space over which the model must generalize but only polynomially in resolution per dimension, showing that tasks which require generalizing over many dimensions are drastically more difficult than tasks involving more detail in fewer dimensions. Our measure can be applied to compute and compare supervised learning, reinforcement learning and meta-learning generalization difficulties against each other. We show that applied empirically, it formally quantifies intuitively expected trends, e.g. that in terms of required inductive bias, MNIST < CIFAR10 < Imagenet and fully observable Markov decision processes (MDPs) < partially observable MDPs. Further, we show that classification of complex images $<$ few-shot meta-learning with simple images. Our measure provides a quantitative metric to guide the construction of more complex tasks requiring greater inductive bias, and thereby encourages the development of more sophisticated architectures and learning algorithms with more powerful generalization capabilities.

翻译：机器学习算法的衡量标准在于其能执行任务的难度，而足够困难的任务是推动强大机器学习模型发展的关键驱动力。然而，量化机器学习基准测试的泛化难度始终是一个挑战。我们提出了据我们所知首个与模型无关的、对任务固有泛化难度的度量方法。我们的归纳偏差复杂度度量量化了在任务上实现良好泛化所需的总信息减去数据所提供的信息。其计算方式为：在给定拟合训练数据的条件下，任务上能实现泛化的假设所占的体积分数。该度量随模型需泛化的空间内在维度呈指数增长，但仅随每个维度的分辨率呈多项式增长，这表明需要跨多个维度泛化的任务远难于涉及较少维度但细节更丰富的任务。我们的度量可应用于计算和比较监督学习、强化学习及元学习的泛化难度。实证应用表明，它能正式量化直观预期的趋势，例如在所需归纳偏差方面，MNIST < CIFAR10 < ImageNet，且完全可观测马尔可夫决策过程（MDPs）< 部分可观测MDPs。此外，我们还发现复杂图像分类 < 基于简单图像的少样本元学习。该度量提供了一种定量指标，用于指导构建需要更大归纳偏差的更复杂任务，从而促进开发具有更强泛化能力的更先进架构和学习算法。