Specifications play a crucial role in neural network verification. They define the precise input regions we aim to verify, typically represented as L-infinity norm balls. While recent research suggests using neural activation patterns (NAPs) as specifications for verifying unseen test set data, it focuses on computing the most refined NAPs, often limited to very small regions in the input space. In this paper, we study the following problem: Given a neural network, find a minimal (coarsest) NAP that is sufficient for formal verification of the network's robustness. Finding the minimal NAP specification not only expands verifiable bounds but also provides insights into which neurons contribute to the model's robustness. To address this problem, we propose several exact and approximate approaches. Our exact approaches leverage the verification tool to find minimal NAP specifications in either a deterministic or statistical manner. Whereas the approximate methods efficiently estimate minimal NAPs using adversarial examples and local gradients, without making calls to the verification tool. This allows us to inspect potential causal links between neurons and the robustness of state-of-the-art neural networks, a task for which existing verification frameworks fail to scale. Our experimental results suggest that minimal NAP specifications require much smaller fractions of neurons compared to the most refined NAP specifications, yet they can significantly expand the verifiable boundaries to several orders of magnitude larger.
翻译:规范在神经网络验证中起着关键作用。它们定义了我们要验证的精确输入区域,通常表示为L-无穷范数球。虽然最近的研究建议使用神经激活模式(NAP)作为验证未见测试集数据的规范,但其重点在于计算最精细的NAP,往往仅限于输入空间中非常小的区域。在本文中,我们研究如下问题:给定一个神经网络,找到一个最小(最粗)的、足以正式验证网络鲁棒性的NAP。找到最小NAP规范不仅能扩展可验证的边界,还能揭示哪些神经元对模型的鲁棒性有贡献。为解决这一问题,我们提出了几种精确和近似的方法。我们的精确方法利用验证工具以确定性或统计方式找到最小NAP规范。而近似方法则通过对抗样本和局部梯度高效地估计最小NAP,无需调用验证工具。这使我们能够检查神经元与最先进神经网络鲁棒性之间的潜在因果关系,而现有验证框架无法胜任此类任务。实验结果表明,与最精细的NAP规范相比,最小NAP规范所需的神经元比例小得多,却能显著扩大可验证边界达数个数量级。