Recent studies have identified a critical challenge in deep neural networks (DNNs) known as ``robust fairness", where models exhibit significant disparities in robust accuracy across different classes. While prior work has attempted to address this issue in adversarial robustness, the study of worst-class certified robustness for smoothed classifiers remains unexplored. Our work bridges this gap by developing a PAC-Bayesian bound for the worst-class error of smoothed classifiers. Through theoretical analysis, we demonstrate that the largest eigenvalue of the smoothed confusion matrix fundamentally influences the worst-class error of smoothed classifiers. Based on this insight, we introduce a regularization method that optimizes the largest eigenvalue of smoothed confusion matrix to enhance worst-class accuracy of the smoothed classifier and further improve its worst-class certified robustness. We provide extensive experimental validation across multiple datasets and model architectures to demonstrate the effectiveness of our approach.
翻译:近期研究揭示了深度神经网络中一个关键挑战,即"鲁棒公平性"问题:模型在不同类别间的鲁棒准确率存在显著差异。尽管先前工作已尝试在对抗鲁棒性领域解决此问题,但针对平滑分类器的最差类别认证鲁棒性研究仍属空白。本研究通过推导平滑分类器最差类别误差的PAC贝叶斯界填补了这一空白。理论分析表明,平滑混淆矩阵的最大特征值从根本上影响平滑分类器的最差类别误差。基于此发现,我们提出一种正则化方法,通过优化平滑混淆矩阵的最大特征值来提升平滑分类器的最差类别准确率,并进一步改善其最差类别认证鲁棒性。我们在多个数据集和模型架构上进行了广泛的实验验证,证明了该方法的有效性。