Adversarial examples pose a security threat to many critical systems built on neural networks. While certified training improves robustness, it also decreases accuracy noticeably. Despite various proposals for addressing this issue, the significant accuracy drop remains. More importantly, it is not clear whether there is a certain fundamental limit on achieving robustness whilst maintaining accuracy. In this work, we offer a novel perspective based on Bayes errors. By adopting Bayes error to robustness analysis, we investigate the limit of certified robust accuracy, taking into account data distribution uncertainties. We first show that the accuracy inevitably decreases in the pursuit of robustness due to changed Bayes error in the altered data distribution. Subsequently, we establish an upper bound for certified robust accuracy, considering the distribution of individual classes and their boundaries. Our theoretical results are empirically evaluated on real-world datasets and are shown to be consistent with the limited success of existing certified training results, \emph{e.g.}, for CIFAR10, our analysis results in an upper bound (of certified robust accuracy) of 67.49\%, meanwhile existing approaches are only able to increase it from 53.89\% in 2017 to 62.84\% in 2023.
翻译:对抗样本对许多基于神经网络构建的关键系统构成安全威胁。尽管认证训练提升了鲁棒性,但同时也显著降低了准确率。尽管已有多种方案试图解决这一问题,但准确率的大幅下降依然存在。更重要的是,在维持准确率的同时实现鲁棒性是否存在根本性极限尚不明确。本研究基于贝叶斯误差提出了全新视角。通过将贝叶斯误差引入鲁棒性分析,我们探究了考虑数据分布不确定性时认证鲁棒准确率的极限。首先,论证了由于数据分布改变导致的贝叶斯误差变化,追求鲁棒性必然导致准确率下降。随后,我们建立了考虑各类别分布及其边界的认证鲁棒准确率上界。理论结果在真实数据集上进行了实证评估,并与现有认证训练方法的有限成功保持一致。例如,针对CIFAR10数据集,我们的分析得出认证鲁棒准确率的上界为67.49%,而现有方法仅能将其从2017年的53.89%提升至2023年的62.84%。