Adversarial examples pose a security threat to many critical systems built on neural networks. While certified training improves robustness, it also decreases accuracy noticeably. Despite various proposals for addressing this issue, the significant accuracy drop remains. More importantly, it is not clear whether there is a certain fundamental limit on achieving robustness whilst maintaining accuracy. In this work, we offer a novel perspective based on Bayes errors. By adopting Bayes error to robustness analysis, we investigate the limit of certified robust accuracy, taking into account data distribution uncertainties. We first show that the accuracy inevitably decreases in the pursuit of robustness due to changed Bayes error in the altered data distribution. Subsequently, we establish an upper bound for certified robust accuracy, considering the distribution of individual classes and their boundaries. Our theoretical results are empirically evaluated on real-world datasets and are shown to be consistent with the limited success of existing certified training results, e.g., for CIFAR10, our analysis results in an upper bound (of certified robust accuracy) of 67.49\%, meanwhile existing approaches are only able to increase it from 53.89\% in 2017 to 62.84\% in 2023.
翻译:对抗样本对基于神经网络的诸多关键系统构成安全威胁。虽然认证训练提升了鲁棒性,但也显著降低了精度。尽管已有多种解决方案提出,精度大幅下降的问题依然存在。更重要的是,目前尚不清楚在保持精度的同时实现鲁棒性是否存在某种根本性限制。本研究从贝叶斯误差的角度提出了一种新颖的理论视角。通过将贝叶斯误差引入鲁棒性分析,我们在考虑数据分布不确定性的情况下,探究了认证鲁棒精度的理论极限。我们首先证明:由于数据分布改变导致的贝叶斯误差变化,在追求鲁棒性的过程中精度下降不可避免。随后,通过考虑各类别分布及其决策边界,我们建立了认证鲁棒精度的理论上界。我们在真实数据集上对理论结果进行了实证评估,结果表明其与现有认证训练方法的有限成果相一致。例如对于CIFAR10数据集,我们的分析得出认证鲁棒精度上界为67.49%,而现有方法仅能将其从2017年的53.89%提升至2023年的62.84%。