Specifications play a crucial role in neural network verification. They define the precise input regions we aim to verify, typically represented as L-infinity norm balls. While recent research suggests using neural activation patterns (NAPs) as specifications for verifying unseen test set data, it focuses on computing the most refined NAPs, often limited to very small regions in the input space. In this paper, we study the following problem: Given a neural network, find a minimal (coarsest) NAP that is sufficient for formal verification of the network's robustness. Finding the minimal NAP specification not only expands verifiable bounds but also provides insights into which neurons contribute to the model's robustness. To address this problem, we propose several exact and approximate approaches. Our exact approaches leverage the verification tool to find minimal NAP specifications in either a deterministic or statistical manner. Whereas the approximate methods efficiently estimate minimal NAPs using adversarial examples and local gradients, without making calls to the verification tool. This allows us to inspect potential causal links between neurons and the robustness of state-of-the-art neural networks, a task for which existing verification frameworks fail to scale. Our experimental results suggest that minimal NAP specifications require much smaller fractions of neurons compared to the most refined NAP specifications, yet they can significantly expand the verifiable boundaries to several orders of magnitude larger.
翻译:规范在神经网络验证中起着至关重要的作用。它们定义了我们需要验证的精确输入区域,通常表示为L-无穷范数球。尽管最近的研究建议使用神经激活模式(NAPs)作为验证未见测试集数据的规范,但其侧重于计算最精细的NAPs,通常仅限于输入空间中非常小的区域。在本文中,我们研究以下问题:给定一个神经网络,找到一个足以形式化验证该网络鲁棒性的最小(最粗糙)NAP。寻找最小NAP规范不仅能扩展可验证的边界,还能提供关于哪些神经元对模型鲁棒性有贡献的洞见。为解决此问题,我们提出了几种精确和近似的方法。我们的精确方法利用验证工具,以确定性或统计性的方式寻找最小NAP规范。而近似方法则通过使用对抗样本和局部梯度来高效估计最小NAPs,无需调用验证工具。这使我们能够检查最先进神经网络的神经元与其鲁棒性之间潜在的因果关系,而现有的验证框架难以扩展到此类任务。我们的实验结果表明,与最精细的NAP规范相比,最小NAP规范所需的神经元比例要小得多,却能将可验证边界显著扩展到几个数量级之大。