We study the problem of certifying the robustness of Bayesian neural networks (BNNs) to adversarial input perturbations. Given a compact set of input points $T \subseteq \mathbb{R}^m$ and a set of output points $S \subseteq \mathbb{R}^n$, we define two notions of robustness for BNNs in an adversarial setting: probabilistic robustness and decision robustness. Probabilistic robustness is the probability that for all points in $T$ the output of a BNN sampled from the posterior is in $S$. On the other hand, decision robustness considers the optimal decision of a BNN and checks if for all points in $T$ the optimal decision of the BNN for a given loss function lies within the output set $S$. Although exact computation of these robustness properties is challenging due to the probabilistic and non-convex nature of BNNs, we present a unified computational framework for efficiently and formally bounding them. Our approach is based on weight interval sampling, integration, and bound propagation techniques, and can be applied to BNNs with a large number of parameters, and independently of the (approximate) inference method employed to train the BNN. We evaluate the effectiveness of our methods on various regression and classification tasks, including an industrial regression benchmark, MNIST, traffic sign recognition, and airborne collision avoidance, and demonstrate that our approach enables certification of robustness and uncertainty of BNN predictions.
翻译:我们研究了贝叶斯神经网络(BNN)在对抗性输入扰动下的鲁棒性认证问题。给定一个紧致输入点集 $T \subseteq \mathbb{R}^m$ 和一个输出点集 $S \subseteq \mathbb{R}^n$,我们在对抗性设定下为BNN定义了两种鲁棒性概念:概率鲁棒性和决策鲁棒性。概率鲁棒性是指从后验分布采样的BNN对所有 $T$ 中的点,其输出位于 $S$ 中的概率。另一方面,决策鲁棒性考虑了BNN的最优决策,并检验是否对所有 $T$ 中的点,在给定损失函数下BNN的最优决策落在输出集 $S$ 内。尽管由于BNN的概率性和非凸性质,精确计算这些鲁棒性属性具有挑战性,但我们提出了一种统一的计算框架,能够高效且形式化地对它们进行边界估计。我们的方法基于权值区间采样、积分和边界传播技术,可应用于具有大量参数的BNN,且独立于训练BNN所采用的(近似)推理方法。我们在各种回归和分类任务上评估了方法的有效性,包括工业回归基准、MNIST、交通标志识别以及空中防撞系统,并证明我们的方法能够对BNN预测的鲁棒性和不确定性进行认证。