We consider the problem of certifying the robustness of deep neural networks against real-world distribution shifts. To do so, we bridge the gap between hand-crafted specifications and realistic deployment settings by proposing a novel neural-symbolic verification framework, in which we train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model. A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations, which are fundamental to many state-of-the-art generative models. To address this challenge, we propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement. The key idea is to "lazily" refine the abstraction of sigmoid functions to exclude spurious counter-examples found in the previous abstraction, thus guaranteeing progress in the verification process while keeping the state-space small. Experiments on the MNIST and CIFAR-10 datasets show that our framework significantly outperforms existing methods on a range of challenging distribution shifts.
翻译:我们研究了针对真实世界分布偏移的深度神经网络鲁棒性认证问题。为此,我们提出了一种新颖的神经符号验证框架,通过训练生成模型从数据中学习扰动,并基于该模型输出定义规范,从而弥合手工规范与真实部署环境之间的鸿沟。该设定带来的独特挑战在于:现有验证器无法紧密近似Sigmoid激活函数,而此类函数是众多前沿生成模型的核心组件。为应对这一挑战,我们提出了一种通用的元算法来处理Sigmoid激活函数,该算法利用反例引导的抽象精化经典理论。其核心思想是通过"惰性"精化Sigmoid函数的抽象表示,排除先前抽象中发现的伪反例,从而在保持状态空间紧凑的同时确保验证进程的收敛性。在MNIST和CIFAR-10数据集上的实验表明,我们的框架在多种具有挑战性的分布偏移场景中显著优于现有方法。