Understanding noise tolerance of learning algorithms under certain conditions is a central quest in learning theory. In this work, we study the problem of computationally efficient PAC learning of halfspaces in the presence of malicious noise, where an adversary can corrupt both instances and labels of training samples. The best-known noise tolerance either depends on a target error rate under distributional assumptions or on a margin parameter under large-margin conditions. In this work, we show that when both types of conditions are satisfied, it is possible to achieve {\em constant} noise tolerance by minimizing a reweighted hinge loss. Our key ingredients include: 1) an efficient algorithm that finds weights to control the gradient deterioration from corrupted samples, and 2) a new analysis on the robustness of the hinge loss equipped with such weights.
翻译:理解学习算法在特定条件下的噪声容忍度是学习理论中的一个核心问题。在本研究中,我们探讨了在存在恶意噪声的情况下高效计算PAC学习半空间的问题,其中对手可以同时破坏训练样本的实例和标签。已知的最佳噪声容忍度要么依赖于分布假设下的目标错误率,要么依赖于大间隔条件下的间隔参数。在本研究中,我们证明当同时满足这两种条件时,通过最小化重加权合页损失可以实现{\em 恒定}的噪声容忍度。我们的关键贡献包括:1)一种高效算法,用于寻找权重以控制由损坏样本引起的梯度劣化;以及2)对配备此类权重的合页损失的鲁棒性进行的新分析。