Inspired by recent work on learning with distribution shift, we give a general outlier removal algorithm called iterative polynomial filtering and show a number of striking applications for supervised learning with contamination: (1) We show that any function class that can be approximated by low-degree polynomials with respect to a hypercontractive distribution can be efficiently learned under bounded contamination (also known as nasty noise). This is a surprising resolution to a longstanding gap between the complexity of agnostic learning and learning with contamination, as it was widely believed that low-degree approximators only implied tolerance to label noise. In particular, it implies the first efficient algorithm for learning halfspaces with $η$-bounded contamination up to error $2η+ε$ with respect to the Gaussian distribution. (2) For any function class that admits the (stronger) notion of sandwiching approximators, we obtain near-optimal learning guarantees even with respect to heavy additive contamination, where far more than $1/2$ of the training set may be added adversarially. Prior related work held only for regression and in a list-decodable setting. (3) We obtain the first efficient algorithms for tolerant testable learning of functions of halfspaces with respect to any fixed log-concave distribution. Even the non-tolerant case for a single halfspace in this setting had remained open. These results significantly advance our understanding of efficient supervised learning under contamination, a setting that has been much less studied than its unsupervised counterpart.


翻译:受近期关于分布偏移下学习研究的启发,我们提出了一种称为迭代多项式滤波的通用离群值去除算法,并展示了其在污染监督学习中的若干突破性应用:(1)我们证明,对于任何可通过低阶多项式在超压缩分布下近似的函数类,均可在有界污染(亦称恶意噪声)条件下实现高效学习。这一结果出人意料地解决了长久以来存在于不可知学习与污染学习复杂度之间的鸿沟,因为学界普遍认为低阶近似器仅能实现对标签噪声的容忍。具体而言,该结论首次推导出在高斯分布下以$2η+ε$误差学习具有$η$-有界污染的半空间函数的高效算法。(2)对于任何具备(更强)夹逼近似性质的函数类,即使在远超过$1/2$训练样本可能被对抗性添加的重度加性污染条件下,我们仍能获得近乎最优的学习保证。先前相关研究仅适用于回归问题且局限于列表可解码设定。(3)我们首次提出了针对任意固定对数凹分布下,半空间函数容忍可测试学习的高效算法。即使在该设定下单个半空间的非容忍情形此前亦未解决。这些成果显著推进了我们对污染条件下高效监督学习的理解,该研究领域相较于无监督学习仍处于探索不足的状态。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员