It is well-known that recurrent neural networks (RNNs), although widely used, are vulnerable to adversarial attacks including one-frame attacks and multi-frame attacks. Though a few certified defenses exist to provide guaranteed robustness against one-frame attacks, we prove that defending against multi-frame attacks remains a challenging problem due to their enormous perturbation space. In this paper, we propose the first certified defense against multi-frame attacks for RNNs called RNN-Guard. To address the above challenge, we adopt the perturb-all-frame strategy to construct perturbation spaces consistent with those in multi-frame attacks. However, the perturb-all-frame strategy causes a precision issue in linear relaxations. To address this issue, we introduce a novel abstract domain called InterZono and design tighter relaxations. We prove that InterZono is more precise than Zonotope yet carries the same time complexity. Experimental evaluations across various datasets and model structures show that the certified robust accuracy calculated by RNN-Guard with InterZono is up to 2.18 times higher than that with Zonotope. In addition, we extend RNN-Guard as the first certified training method against multi-frame attacks to directly enhance RNNs' robustness. The results show that the certified robust accuracy of models trained with RNN-Guard against multi-frame attacks is 15.47 to 67.65 percentage points higher than those with other training methods.
翻译:众所周知,循环神经网络虽应用广泛,却易受包括单帧攻击与多帧攻击在内的对抗性攻击影响。尽管已有少数可认证防御方法能针对单帧攻击提供鲁棒性保证,但本文证明,由于多帧攻击扰动空间巨大,防御此类攻击仍具挑战性。为此,我们提出首个面向循环神经网络多帧攻击的可认证防御方法——RNN-Guard。为应对上述挑战,我们采用"全帧扰动"策略构建与多帧攻击一致的扰动空间。然而,该策略会导致线性松弛的精度问题。我们通过引入名为InterZono的新型抽象域并设计更紧致的松弛方法来解决该问题,并证明InterZono在保持与Zonotope相同时间复杂度的前提下具有更高精度。跨数据集与模型结构的实验评估表明,采用InterZono的RNN-Guard计算的可认证鲁棒准确率比采用Zonotope的方法最高提升2.18倍。此外,我们将RNN-Guard扩展为首个面向多帧攻击的可认证训练方法,以直接增强循环神经网络的鲁棒性。结果表明,经RNN-Guard训练的模型在抵御多帧攻击时,其可认证鲁棒准确率相较其他训练方法提升15.47至67.65个百分点。