Having reliable specifications is an unavoidable challenge in achieving verifiable correctness, robustness, and interpretability of AI systems. Existing specifications for neural networks are in the paradigm of data as specification. That is, the local neighborhood centering around a reference input is considered to be correct (or robust). While existing specifications contribute to verifying adversarial robustness, a significant problem in many research domains, our empirical study shows that those verified regions are somewhat tight, and thus fail to allow verification of test set inputs, making them impractical for some real-world applications. To this end, we propose a new family of specifications called neural representation as specification, which uses the intrinsic information of neural networks - neural activation patterns (NAPs), rather than input data to specify the correctness and/or robustness of neural network predictions. We present a simple statistical approach to mining neural activation patterns. To show the effectiveness of discovered NAPs, we formally verify several important properties, such as various types of misclassifications will never happen for a given NAP, and there is no ambiguity between different NAPs. We show that by using NAP, we can verify a significant region of the input space, while still recalling 84% of the data on MNIST. Moreover, we can push the verifiable bound to 10 times larger on the CIFAR10 benchmark. Thus, we argue that NAPs can potentially be used as a more reliable and extensible specification for neural network verification.
翻译:拥有可靠的规范是实现人工智能系统可验证正确性、鲁棒性和可解释性所面临的不可避免的挑战。现有的神经网络规范遵循"数据即规范"范式,即认为以参考输入为中心的局部邻域是正确的(或鲁棒的)。虽然现有规范有助于验证对抗鲁棒性——这一在许多研究领域中的重要问题——但我们的实证研究表明,这些经过验证的区域较为紧缩,导致无法验证测试集输入,使其在某些实际应用中缺乏实用性。为此,我们提出一类新规范——"神经表征即规范",利用神经网络的固有信息(神经激活模式,NAPs)而非输入数据来指定神经网络预测的正确性和/或鲁棒性。我们提出了一种简单的统计方法来挖掘神经激活模式。为证明所发现NAPs的有效性,我们正式验证了几个重要性质,例如:给定NAP下永远不会发生各类误分类,且不同NAP之间不存在歧义。研究表明,通过使用NAP,我们能够验证输入空间中的显著区域,同时在MNIST数据集上仍能召回84%的数据。此外,在CIFAR10基准测试中,我们可将可验证边界扩大至原来的10倍。因此,我们认为NAPs有潜力作为更可靠且可扩展的神经网络验证规范。