Testable Learning with Distribution Shift

We revisit the fundamental problem of learning with distribution shift, in which a learner is given labeled samples from training distribution $D$, unlabeled samples from test distribution $D'$ and is asked to output a classifier with low test error. The standard approach in this setting is to bound the loss of a classifier in terms of some notion of distance between $D$ and $D'$. These distances, however, seem difficult to compute and do not lead to efficient algorithms. We depart from this paradigm and define a new model called testable learning with distribution shift, where we can obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. In this model, a learner outputs a classifier with low test error whenever samples from $D$ and $D'$ pass an associated test; moreover, the test must accept if the marginal of $D$ equals the marginal of $D'$. We give several positive results for learning well-studied concept classes such as halfspaces, intersections of halfspaces, and decision trees when the marginal of $D$ is Gaussian or uniform on $\{\pm 1\}^d$. Prior to our work, no efficient algorithms for these basic cases were known without strong assumptions on $D'$. For halfspaces in the realizable case (where there exists a halfspace consistent with both $D$ and $D'$), we combine a moment-matching approach with ideas from active learning to simulate an efficient oracle for estimating disagreement regions. To extend to the non-realizable setting, we apply recent work from testable (agnostic) learning. More generally, we prove that any function class with low-degree $L_2$-sandwiching polynomial approximators can be learned in our model. We apply constructions from the pseudorandomness literature to obtain the required approximators.

翻译：我们重新审视分布偏移下的学习基本问题：学习器从训练分布 $D$ 获得带标签样本，从测试分布 $D'$ 获得无标签样本，要求输出具有低测试误差的分类器。传统方法通过 $D$ 与 $D'$ 间的某种距离度量来约束分类器损失，但这些距离计算困难且难以设计高效算法。我们突破这一范式，定义了一个新模型——可测试的分布偏移学习，在此模型下可对测试分布上的分类器性能提供可证明的高效算法。在该模型中，当 $D$ 和 $D'$ 的样本通过关联测试时，学习器输出低测试误差的分类器；此外，若 $D$ 的边缘分布等于 $D'$ 的边缘分布，则测试必须接受。当 $D$ 的边缘分布为高斯分布或 $\{\pm 1\}^d$ 上的均匀分布时，我们获得了半空间、半空间交集、决策树等经典概念类学习的多个正面结果。此前，这些基础情形在无强假设的情况下均缺乏高效算法。在可实现情形（存在与 $D$ 和 $D'$ 都一致的半空间）中，我们结合矩匹配方法与主动学习技术，模拟估算不一致区域的高效预言机。为扩展到不可实现情形，我们应用了近期可测试（不可知）学习的成果。更一般地，我们证明任何具有低阶 $L_2$-三明治多项式逼近器的函数类均可在此模型下学习，并利用伪随机性文献中的构造方法获得所需逼近器。