When deploying classifiers in the real world, users expect them to respond to inputs appropriately. However, traditional classifiers are not equipped to handle inputs which lie far from the distribution they were trained on. Malicious actors can exploit this defect by making adversarial perturbations designed to cause the classifier to give an incorrect output. Classification-with-rejection methods attempt to solve this problem by allowing networks to refuse to classify an input in which they have low confidence. This works well for strongly adversarial examples, but also leads to the rejection of weakly perturbed images, which intuitively could be correctly classified. To address these issues, we propose Reed-Muller Aggregation Networks (RMAggNet), a classifier inspired by Reed-Muller error-correction codes which can correct and reject inputs. This paper shows that RMAggNet can minimise incorrectness while maintaining good correctness over multiple adversarial attacks at different perturbation budgets by leveraging the ability to correct errors in the classification process. This provides an alternative classification-with-rejection method which can reduce the amount of additional processing in situations where a small number of incorrect classifications are permissible.
翻译:在现实世界中部署分类器时,用户期望它们能对输入做出适当响应。然而,传统分类器无法有效处理远离训练数据分布的输入。恶意攻击者可通过设计对抗性扰动利用这一缺陷,诱使分类器产生错误输出。带拒绝的分类方法通过允许网络对低置信度输入拒绝分类来尝试解决此问题。该方法对强对抗样本效果良好,但也会导致弱扰动图像被拒绝——而这类图像本应能被正确分类。为应对这些挑战,我们提出Reed-Muller聚合网络(RMAggNet),该分类器受Reed-Muller纠错码启发,具备输入纠正与拒绝能力。本文证明,RMAggNet通过利用分类过程中的纠错能力,能在不同扰动预算的多种对抗攻击下,在保持较高正确率的同时最小化错误率。这为允许少量错误分类的场景提供了一种替代性带拒绝分类方法,可有效减少额外处理开销。