Deep neural networks and other modern machine learning models are often susceptible to adversarial attacks. Indeed, an adversary may often be able to change a model's prediction through a small, directed perturbation of the model's input - an issue in safety-critical applications. Adversarially robust machine learning is usually based on a minmax optimisation problem that minimises the machine learning loss under maximisation-based adversarial attacks. In this work, we study adversaries that determine their attack using a Bayesian statistical approach rather than maximisation. The resulting Bayesian adversarial robustness problem is a relaxation of the usual minmax problem. To solve this problem, we propose Abram - a continuous-time particle system that shall approximate the gradient flow corresponding to the underlying learning problem. We show that Abram approximates a McKean-Vlasov process and justify the use of Abram by giving assumptions under which the McKean-Vlasov process finds the minimiser of the Bayesian adversarial robustness problem. We discuss two ways to discretise Abram and show its suitability in benchmark adversarial deep learning experiments.
翻译:深度神经网络及其他现代机器学习模型常易受对抗攻击影响。事实上,对抗者往往能够通过对模型输入施加微小定向扰动来改变模型预测结果——这在安全关键型应用中构成严重隐患。传统对抗鲁棒机器学习通常基于一个极小极大优化问题,即在最大化对抗攻击下最小化机器学习损失。本研究探讨采用贝叶斯统计方法而非最大化原则确定攻击策略的对抗者。由此产生的贝叶斯对抗鲁棒性问题是对传统极小极大问题的松弛化处理。为解决该问题,我们提出Abram——一个旨在逼近底层学习问题对应梯度流的连续时间粒子系统。我们证明Abram可近似为McKean-Vlasov过程,并通过给出该过程能求解贝叶斯对抗鲁棒性问题最小化解的假设条件,论证Abram方法的合理性。我们探讨了Abram的两种离散化实现方式,并在基准对抗深度学习实验中验证了其适用性。