We give the first efficient algorithm for learning halfspaces in the testable learning model recently defined by Rubinfeld and Vasilyan (2023). In this model, a learner certifies that the accuracy of its output hypothesis is near optimal whenever the training set passes an associated test, and training sets drawn from some target distribution -- e.g., the Gaussian -- must pass the test. This model is more challenging than distribution-specific agnostic or Massart noise models where the learner is allowed to fail arbitrarily if the distributional assumption does not hold. We consider the setting where the target distribution is Gaussian (or more generally any strongly log-concave distribution) in $d$ dimensions and the noise model is either Massart or adversarial (agnostic). For Massart noise, our tester-learner runs in polynomial time and outputs a hypothesis with (information-theoretically optimal) error $\mathsf{opt} + \epsilon$ for any strongly log-concave target distribution. For adversarial noise, our tester-learner obtains error $O(\mathsf{opt}) + \epsilon$ in polynomial time when the target distribution is Gaussian; for strongly log-concave distributions, we obtain $\tilde{O}(\mathsf{opt}) + \epsilon$ in quasipolynomial time. Prior work on testable learning ignores the labels in the training set and checks that the empirical moments of the covariates are close to the moments of the base distribution. Here we develop new tests of independent interest that make critical use of the labels and combine them with the moment-matching approach of Gollakota et al. (2023). This enables us to simulate a variant of the algorithm of Diakonikolas et al. (2020) for learning noisy halfspaces using nonconvex SGD but in the testable learning setting.
翻译:我们给出了首个在Rubinfeld和Vasilyan(2023)最近定义的可测试学习模型中学习半空间的高效算法。在该模型中,只要训练集通过关联测试,学习器即保证其输出假设的精度接近最优,且从某些目标分布(例如高斯分布)中抽取的训练集必须通过该测试。该模型比分布特定的不可知或Massart噪声模型更具挑战性,后者允许学习器在分布假设不成立时任意失败。我们考虑目标分布为高斯分布(或更一般地,任何强对数凹分布)的$d$维场景,噪声模型为Massart或对抗性(不可知)。对于Massart噪声,我们的测试学习器在多项式时间内运行,并针对任何强对数凹目标分布输出具有(信息论最优)误差$\mathsf{opt} + \epsilon$的假设。对于对抗性噪声,当目标分布为高斯分布时,我们的测试学习器在多项式时间内获得误差$O(\mathsf{opt}) + \epsilon$;对于强对数凹分布,我们在拟多项式时间内获得$\tilde{O}(\mathsf{opt}) + \epsilon$。先前关于可测试学习的工作忽略了训练集中的标签,而仅检查协变量的经验矩是否接近基础分布的矩。在此,我们开发了独立有趣的新测试,这些测试关键地利用了标签,并将其与Gollakota等人(2023)的矩匹配方法相结合。这使我们能够在可测试学习环境中模拟Diakonikolas等人(2020)使用非凸SGD学习含噪声半空间的算法变体。