We give the first efficient algorithm for learning halfspaces in the testable learning model recently defined by Rubinfeld and Vasilyan (2023). In this model, a learner certifies that the accuracy of its output hypothesis is near optimal whenever the training set passes an associated test, and training sets drawn from some target distribution -- e.g., the Gaussian -- must pass the test. This model is more challenging than distribution-specific agnostic or Massart noise models where the learner is allowed to fail arbitrarily if the distributional assumption does not hold. We consider the setting where the target distribution is Gaussian (or more generally any strongly log-concave distribution) in $d$ dimensions and the noise model is either Massart or adversarial (agnostic). For Massart noise our tester-learner runs in polynomial time and outputs a hypothesis with error $\mathsf{opt} + \epsilon$, which is information-theoretically optimal. For adversarial noise our tester-learner has error $\tilde{O}(\mathsf{opt}) + \epsilon$ and runs in quasipolynomial time. Prior work on testable learning ignores the labels in the training set and checks that the empirical moments of the covariates are close to the moments of the base distribution. Here we develop new tests of independent interest that make critical use of the labels and combine them with the moment-matching approach of Gollakota et al. (2023). This enables us to simulate a variant of the algorithm of Diakonikolas et al. (2020) for learning noisy halfspaces using nonconvex SGD but in the testable learning setting.
翻译:我们首次给出了在Rubinfeld与Vasilyan (2023)近期定义的可测试学习模型中学习半空间的高效算法。在该模型中,当训练集通过相关测试时,学习器需保证其输出假设的精度接近最优值,且从特定目标分布(如高斯分布)中抽取的训练集必须通过该测试。该模型比分布特定的不可知噪声模型或Massart噪声模型更具挑战性——在后者中,若分布假设不成立,学习器允许任意失败。我们考虑目标分布为$d$维高斯分布(或更一般地,任意强对数凹分布),且噪声模型为Massart或对抗性(不可知)噪声的场景。对于Massart噪声,我们的测试学习器在多项式时间内运行,输出假设的误差为$\mathsf{opt} + \epsilon$,达到信息论最优界。对于对抗性噪声,我们的测试学习器误差为$\tilde{O}(\mathsf{opt}) + \epsilon$,运行时间为拟多项式时间。已有的可测试学习工作忽略训练集标签,仅检测协变量的经验矩是否接近基础分布的矩。本文提出了独立有意义的新检测方法,关键性利用了标签信息,并将其与Gollakota等(2023)的矩匹配方法相结合。这使我们能够在可测试学习框架下,模拟Diakonikolas等(2020)使用非凸随机梯度下降学习含噪半空间的算法变体。