In the context of adversarial robustness, we make three strongly related contributions. First, we prove that while attacking ReLU classifiers is $\mathit{NP}$-hard, ensuring their robustness at training time is $\Sigma^2_P$-hard (even on a single example). This asymmetry provides a rationale for the fact that robust classifications approaches are frequently fooled in the literature. Second, we show that inference-time robustness certificates are not affected by this asymmetry, by introducing a proof-of-concept approach named Counter-Attack (CA). Indeed, CA displays a reversed asymmetry: running the defense is $\mathit{NP}$-hard, while attacking it is $\Sigma_2^P$-hard. Finally, motivated by our previous result, we argue that adversarial attacks can be used in the context of robustness certification, and provide an empirical evaluation of their effectiveness. As a byproduct of this process, we also release UG100, a benchmark dataset for adversarial attacks.
翻译:在对抗鲁棒性背景下,我们做出了三项紧密相关的贡献。首先,我们证明虽然攻击ReLU分类器是$\mathit{NP}$-难的,但在训练时确保其鲁棒性是$\Sigma^2_P$-难的(即使针对单个样本)。这种不对称性为文献中鲁棒分类方法常被攻破的现象提供了理论依据。其次,我们通过引入名为Counter-Attack(CA)的概念验证方法,证明了推理时的鲁棒性认证不受此不对称性影响。具体而言,CA呈现出相反的不对称性:执行防御是$\mathit{NP}$-难的,而攻击它是$\Sigma_2^P$-难的。最后,受先前结果的启发,我们论证了对抗攻击可用于鲁棒性认证场景,并提供了其有效性的实证评估。作为该过程的副产品,我们还发布了UG100——一个用于对抗攻击的基准数据集。