TADS are a novel, concise white-box representation of neural networks. In this paper, we apply TADS to the problem of neural network verification, using them to generate either proofs or concise error characterizations for desirable neural network properties. In a case study, we consider the robustness of neural networks to adversarial attacks, i.e., small changes to an input that drastically change a neural networks perception, and show that TADS can be used to provide precise diagnostics on how and where robustness errors a occur. We achieve these results by introducing Precondition Projection, a technique that yields a TADS describing network behavior precisely on a given subset of its input space, and combining it with PCA, a traditional, well-understood dimensionality reduction technique. We show that PCA is easily compatible with TADS. All analyses can be implemented in a straightforward fashion using the rich algebraic properties of TADS, demonstrating the utility of the TADS framework for neural network explainability and verification. While TADS do not yet scale as efficiently as state-of-the-art neural network verifiers, we show that, using PCA-based simplifications, they can still scale to mediumsized problems and yield concise explanations for potential errors that can be used for other purposes such as debugging a network or generating new training samples.
翻译:TADS是一种新颖且简洁的神经网络白盒表示形式。本文中,我们将TADS应用于神经网络验证问题,利用它们生成神经网络所需属性的证明或简洁的错误特征描述。通过一个案例研究,我们考察了神经网络对对抗性攻击的鲁棒性——即输入微小变化导致神经网络感知剧烈改变——并证明TADS能够精确诊断鲁棒性错误发生的方式与位置。我们引入了前提投影技术,该技术可在给定的输入子空间上精确描述网络行为并生成TADS,再将其与经典且易于理解的降维技术PCA相结合,从而实现了上述成果。研究表明,PCA与TADS能够轻松兼容。所有分析均可利用TADS丰富的代数性质以直接的方式实现,这展示了TADS框架对神经网络可解释性和验证的实用性。尽管TADS目前尚无法达到最先进神经网络验证器的效率,但我们证明,借助基于PCA的简化方法,它们仍可扩展到中等规模的问题,并为潜在错误生成可用于网络调试或生成新训练样本等目的的简洁解释。