No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often referenced in support of the notion that individual problems require specially tailored inductive biases. While virtually all uniformly sampled datasets have high complexity, real-world problems disproportionately generate low-complexity data, and we argue that neural network models share this same preference, formalized using Kolmogorov complexity. Notably, we show that architectures designed for a particular domain, such as computer vision, can compress datasets on a variety of seemingly unrelated domains. Our experiments show that pre-trained and even randomly initialized language models prefer to generate low-complexity sequences. Whereas no free lunch theorems seemingly indicate that individual problems require specialized learners, we explain how tasks that often require human intervention such as picking an appropriately sized model when labeled data is scarce or plentiful can be automated into a single learning algorithm. These observations justify the trend in deep learning of unifying seemingly disparate problems with an increasingly small set of machine learning models.
翻译:监督学习的没有免费午餐定理指出,不存在能够解决所有问题的学习器,或者说,所有学习器在均匀分布的学习问题上的平均准确率完全相同。因此,这些定理常被引用来支持这样一种观点:具体问题需要专门设计的归纳偏置。虽然几乎所有均匀采样的数据集都具有高复杂度,但现实世界的问题却不成比例地产生低复杂度数据;我们认为神经网络模型共享这种相同的偏好,并利用柯尔莫哥洛夫复杂度对此进行了形式化。值得注意的是,我们证明了为特定领域(例如计算机视觉)设计的架构,能够压缩各种看似无关领域的数据集。我们的实验表明,预训练的、甚至随机初始化的语言模型都倾向于生成低复杂度序列。尽管没有免费午餐定理似乎表明具体问题需要专门的学习器,但我们解释了如何将通常需要人工干预的任务(例如在标注数据稀缺或充足时选择适当规模的模型)自动化到一个单一的学习算法中。这些观察结果证明了深度学习领域的一种趋势是合理的:用日益减少的机器学习模型集合来统一看似迥异的问题。