We give the first provably efficient algorithms for learning neural networks with distribution shift. We work in the Testable Learning with Distribution Shift framework (TDS learning) of Klivans et al. (2024), where the learner receives labeled examples from a training distribution and unlabeled examples from a test distribution and must either output a hypothesis with low test error or reject if distribution shift is detected. No assumptions are made on the test distribution. All prior work in TDS learning focuses on classification, while here we must handle the setting of nonconvex regression. Our results apply to real-valued networks with arbitrary Lipschitz activations and work whenever the training distribution has strictly sub-exponential tails. For training distributions that are bounded and hypercontractive, we give a fully polynomial-time algorithm for TDS learning one hidden-layer networks with sigmoid activations. We achieve this by importing classical kernel methods into the TDS framework using data-dependent feature maps and a type of kernel matrix that couples samples from both train and test distributions.
翻译:我们首次提出了在分布偏移下学习神经网络的可证明高效算法。我们采用Klivans等人(2024)提出的可测试分布偏移学习框架,其中学习者从训练分布接收带标签样本,从测试分布接收无标签样本,必须输出具有低测试误差的假设或在检测到分布偏移时拒绝。不对测试分布做任何假设。先前所有关于可测试分布偏移学习的研究都集中于分类问题,而本文必须处理非凸回归场景。我们的结果适用于具有任意Lipschitz激活函数的实值网络,且只要训练分布具有严格亚指数尾部即可生效。对于有界且超压缩的训练分布,我们给出了针对单隐层sigmoid激活网络的可测试分布偏移学习的完全多项式时间算法。我们通过将经典核方法引入可测试分布偏移框架实现这一目标,其中采用了数据依赖的特征映射和一种耦合训练与测试分布样本的核矩阵类型。