No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often referenced in support of the notion that individual problems require specially tailored inductive biases. While virtually all uniformly sampled datasets have high complexity, real-world problems disproportionately generate low-complexity data, and we argue that neural network models share this same preference, formalized using Kolmogorov complexity. Notably, we show that architectures designed for a particular domain, such as computer vision, can compress datasets on a variety of seemingly unrelated domains. Our experiments show that pre-trained and even randomly initialized language models prefer to generate low-complexity sequences. Whereas no free lunch theorems seemingly indicate that individual problems require specialized learners, we explain how tasks that often require human intervention such as picking an appropriately sized model when labeled data is scarce or plentiful can be automated into a single learning algorithm. These observations justify the trend in deep learning of unifying seemingly disparate problems with an increasingly small set of machine learning models.
翻译:监督学习中的没有免费午餐定理指出,没有任何学习器能够解决所有问题,或者所有学习器在关于学习问题的均匀分布上平均达到完全相同的准确率。因此,这些定理常被引用来支持“个别问题需要专门定制的归纳偏好”这一观点。尽管几乎所有均匀采样的数据集都具有高复杂度,但现实世界中的问题却不成比例地生成低复杂度数据,我们认为神经网络模型也共享这一偏好,并用柯尔莫哥洛夫复杂度加以形式化。值得注意的是,我们展示了专为特定领域(如计算机视觉)设计的架构能够压缩各种看似无关领域的数据集。我们的实验表明,预训练甚至随机初始化的语言模型更倾向于生成低复杂度序列。虽然没有免费午餐定理似乎表明个别问题需要专门的学习器,但我们解释了那些通常需要人工干预的任务(例如在标注数据稀缺或丰富时选择适当规模的模型)如何能被自动化地整合到单一学习算法中。这些观察结果证明了深度学习领域将看似不同的问题统一到日益减少的机器学习模型集合中的趋势。