Explaining out-of-distribution generalization has been a central problem in epistemology since Goodman's "grue" puzzle in 1946. Today it's a central problem in machine learning, including AI alignment. Here we propose a principled account of OOD generalization with three main ingredients. First, the world is always presented to experience not as an amorphous mass, but via distinguished features (for example, visual and auditory channels). Second, Occam's Razor favors hypotheses that are "sparse," meaning that they depend on as few features as possible. Third, sparse hypotheses will generalize from a training to a test distribution, provided the two distributions sufficiently overlap on their restrictions to the features that are either actually relevant or hypothesized to be. The two distributions could diverge arbitrarily on other features. We prove a simple theorem that formalizes the above intuitions, generalizing the classic sample complexity bound of Blumer et al. to an OOD context. We then generalize sparse classifiers to subspace juntas, where the ground truth classifier depends solely on a low-dimensional linear subspace of the features.
翻译:自1946年古德曼提出"绿蓝悖论"以来,解释分布外泛化问题一直是认识论的核心议题。如今,该问题已成为机器学习(包括人工智能对齐)领域的核心挑战。本文提出一种基于原理的分布外泛化解释框架,包含三个核心要素:首先,世界总是通过特定特征通道(例如视觉与听觉通道)呈现于经验之中,而非以混沌整体的形式存在;其次,奥卡姆剃刀原则倾向于选择"稀疏"假设,即依赖尽可能少的特征维度;第三,当训练分布与测试分布在真实相关特征或假设相关特征的限制域上具有充分重叠时,稀疏假设即可实现跨分布泛化——即使两个分布在其他特征维度上存在任意差异。我们通过严格定理形式化上述直观认知,将Blumer等人的经典样本复杂度界推广至分布外场景。进而将稀疏分类器拓展至子空间联合函数,其中真实分类器仅依赖于特征的低维线性子空间。