The vulnerability of deep neural network models to adversarial example attacks is a practical challenge in many artificial intelligence applications. A recent line of work shows that the use of randomization in adversarial training is the key to find optimal strategies against adversarial example attacks. However, in a fully randomized setting where both the defender and the attacker can use randomized strategies, there are no efficient algorithm for finding such an optimal strategy. To fill the gap, we propose the first algorithm of its kind, called FRAT, which models the problem with a new infinite-dimensional continuous-time flow on probability distribution spaces. FRAT maintains a lightweight mixture of models for the defender, with flexibility to efficiently update mixing weights and model parameters at each iteration. Furthermore, FRAT utilizes lightweight sampling subroutines to construct a random strategy for the attacker. We prove that the continuous-time limit of FRAT converges to a mixed Nash equilibria in a zero-sum game formed by a defender and an attacker. Experimental results also demonstrate the efficiency of FRAT on CIFAR-10 and CIFAR-100 datasets.
翻译:深度神经网络模型对对抗样本攻击的脆弱性,是众多人工智能应用中的实际挑战。近期一系列研究表明,在对抗训练中采用随机化策略是寻找抵御对抗样本攻击最优策略的关键。然而,在防御方与攻击方均可使用随机化策略的完全随机化场景下,目前尚无高效算法能求解此类最优策略。为填补这一空白,我们首次提出名为FRAT的算法,该算法通过概率分布空间上新型无穷维连续时间流对问题进行建模。FRAT为防御方维护轻量级模型混合体,可在每次迭代中高效更新混合权重与模型参数。此外,FRAT利用轻量化采样子程序为攻击方构建随机策略。我们证明FRAT的连续时间极限收敛于防御方与攻击方零和博弈中的混合纳什均衡。在CIFAR-10和CIFAR-100数据集上的实验结果也验证了FRAT的高效性。