NLP Verification: Towards a General Methodology for Certifying Robustness

Deep neural networks have exhibited substantial success in the field of Natural Language Processing and ensuring their safety and reliability is crucial: there are safety critical contexts where such models must be robust to variability or attack, and give guarantees over their output. Unlike Computer Vision, NLP lacks a unified verification methodology and, despite recent advancements in literature, they are often light on the pragmatical issues of NLP verification. In this paper, we attempt to distil and evaluate general components of an NLP verification pipeline, that emerges from the progress in the field to date. Our contributions are two-fold. Firstly, we give a general (i.e. algorithm-independent) characterisation of verifiable subspaces that result from embedding sentences into continuous spaces. We identify, and give an effective method to deal with, the technical challenge of semantic generalisability of verified subspaces; and propose it as a standard metric in the NLP verification pipelines (alongside with the standard metrics of model accuracy and model verifiability). Secondly, we propose a general methodology to analyse the effect of the embedding gap -- a problem that refers to the discrepancy between verification of geometric subspaces, and the semantic meaning of sentences which the geometric subspaces are supposed to represent. In extreme cases, poor choices in embedding of sentences may invalidate verification results. We propose a number of practical NLP methods that can help to quantify the effects of the embedding gap; and in particular we propose the metric of falsifiability of semantic subspaces as another fundamental metric to be reported as part of the NLP verification pipeline. We believe that together these general principles pave the way towards a more consolidated and effective development of this new domain.

翻译：深度神经网络在自然语言处理领域已展现出显著成就，确保其安全性与可靠性至关重要：在安全关键场景中，此类模型必须对输入变化或攻击具备鲁棒性，并能对其输出提供可验证的保证。与计算机视觉领域不同，自然语言处理领域目前缺乏统一的验证方法学，且尽管近期文献有所进展，现有研究往往忽视自然语言处理验证实践中的关键问题。本文尝试提炼并评估自然语言处理验证流程中的通用组件，这些组件源于该领域迄今的发展成果。我们的贡献主要体现在两个方面。首先，我们提出了一种（即算法无关的）可验证子空间的通用表征方法，该方法通过将句子嵌入连续空间实现。我们识别了可验证子空间语义泛化性的技术挑战，并给出了有效的处理方法；建议将其作为自然语言处理验证流程的标准度量指标（与模型精度和模型可验证性等传统指标并列）。其次，我们提出了一种通用方法学来分析嵌入间隙的影响——该问题指几何子空间的验证结果与这些几何子空间所代表的句子语义含义之间的差异。在极端情况下，不当的句子嵌入选择可能导致验证结果失效。我们提出了若干实用的自然语言处理方法以量化嵌入间隙的影响；特别地，我们建议将语义子空间的可证伪性作为另一项基础度量指标，纳入自然语言处理验证流程的报告体系。我们相信，这些通用原则共同为这一新兴领域迈向更系统化、更有效的发展路径奠定了基石。