We study the problem of learning a drifting concept in the presence of Massart noise. In this framework, an online learner has access to a history of independent samples whose labels are noisy versions of a target concept that may change from round to round. The goal is to output, in each round, a hypothesis with small prediction error. We study the complexity of this learning problem for the fundamental class of margin-separable linear classifiers (halfspaces). On the positive side, we give a computationally efficient learner achieving error $η+ \tilde O(Δ^{1/3}/γ)$, where $η$ upper bounds the Massart noise rate, $Δ$ is the drift rate, and $γ$ is the margin. Interestingly, in the realizable setting, an adaptation of our techniques yields an efficient learner with an improved error rate over prior work. On the lower-bound side, we provide formal evidence of an information-computation tradeoff, strongly suggesting that our algorithm's performance is essentially optimal. Specifically, while the information-theoretically optimal error scales with $Δ^{1/2}$, we prove that $Δ^{1/3}$-scaling is unavoidable for low-degree polynomial tests, even in the special case of random classification noise.
翻译:我们研究了在Massart噪声存在下学习漂移概念的问题。在该框架中,在线学习器可获取一系列独立样本的历史数据,这些样本的标签是目标概念(可能逐轮变化)的含噪版本。目标是在每一轮输出具有较小预测误差的假设。我们针对边缘可分线性分类器(半空间)这一基础类别,探究了该学习问题的复杂度。在正面结果方面,我们给出了一种计算高效的学习器,其误差达到$η+ \tilde O(Δ^{1/3}/γ)$,其中$η$是Massart噪声率的上界,$Δ$是漂移率,$γ$是边缘。有趣的是,在可实现设定下,我们的技术经过调整后得到一个高效学习器,其误差率优于先前工作。在下界方面,我们提供了信息-计算权衡的形式化证据,强烈表明我们的算法性能本质上是最优的。具体而言,尽管信息论最优误差随$Δ^{1/2}$缩放,但我们证明对于低阶多项式检验(即使在随机分类噪声的特例中),$Δ^{1/3}$缩放是不可避免的。