Adversarial robustness often comes at the cost of degraded accuracy, impeding the real-life application of robust classification models. Training-based solutions for better trade-offs are limited by incompatibilities with already-trained high-performance large models, necessitating the exploration of training-free ensemble approaches. Observing that robust models are more confident in correct predictions than in incorrect ones on clean and adversarial data alike, we speculate amplifying this "benign confidence property" can reconcile accuracy and robustness in an ensemble setting. To achieve so, we propose "MixedNUTS", a training-free method where the output logits of a robust classifier and a standard non-robust classifier are processed by nonlinear transformations with only three parameters, which are optimized through an efficient algorithm. MixedNUTS then converts the transformed logits into probabilities and mixes them as the overall output. On CIFAR-10, CIFAR-100, and ImageNet datasets, experimental results with custom strong adaptive attacks demonstrate MixedNUTS's vastly improved accuracy and near-SOTA robustness -- it boosts CIFAR-100 clean accuracy by 7.86 points, sacrificing merely 0.87 points in robust accuracy.
翻译:对抗鲁棒性往往以准确率下降为代价,阻碍了鲁棒分类模型的实际应用。基于训练的折中方案受限于与已训练的高性能大型模型不兼容,亟需探索免训练的集成方法。我们观察到,在干净数据和对抗数据上,鲁棒模型对正确预测比错误预测更具信心,推测放大这种"良性置信特性"能在集成框架下调和准确率与鲁棒性。为此,我们提出"MixedNUTS"——一种免训练方法,该方法通过仅含三个参数的非线性变换处理鲁棒分类器与标准非鲁棒分类器的输出logits,并通过高效算法优化这些参数。MixedNUTS将变换后的logits转化为概率并混合作为最终输出。在CIFAR-10、CIFAR-100和ImageNet数据集上,针对自定义强自适应攻击的实验结果表明,MixedNUTS大幅提升了准确率并接近SOTA鲁棒性——在CIFAR-100上,其干净准确率提升7.86个百分点,而鲁棒准确率仅下降0.87个百分点。