Reasoning has been a central topic in artificial intelligence from the beginning. The recent progress made on distributed representation and neural networks continues to improve the state-of-the-art performance of natural language inference. However, it remains an open question whether the models perform real reasoning to reach their conclusions or rely on spurious correlations. Adversarial attacks have proven to be an important tool to help evaluate the Achilles' heel of the victim models. In this study, we explore the fundamental problem of developing attack models based on logic formalism. We propose NatLogAttack to perform systematic attacks centring around natural logic, a classical logic formalism that is traceable back to Aristotle's syllogism and has been closely developed for natural language inference. The proposed framework renders both label-preserving and label-flipping attacks. We show that compared to the existing attack models, NatLogAttack generates better adversarial examples with fewer visits to the victim models. The victim models are found to be more vulnerable under the label-flipping setting. NatLogAttack provides a tool to probe the existing and future NLI models' capacity from a key viewpoint and we hope more logic-based attacks will be further explored for understanding the desired property of reasoning.
翻译:推理从人工智能诞生之初便是核心议题。近年来,分布式表示与神经网络的进展持续提升了自然语言推理的前沿性能。然而,当前模型是否真正基于推理得出结论,抑或依赖虚假相关性,仍是悬而未决的问题。对抗攻击已被证明是评估目标模型关键弱点的有效工具。本研究探讨了基于逻辑形式化构建攻击模型这一基础性问题。我们提出NatLogAttack方法,围绕自然逻辑(可追溯至亚里士多德三段论、与自然语言推理紧密相关的经典逻辑形式化)实施系统性攻击。该框架同时支持标签保持型与标签翻转型攻击。实验表明,相较于现有攻击模型,NatLogAttack能以更少对目标模型的查询次数生成更优的对抗样本。在标签翻转设定下,目标模型表现出更强的脆弱性。NatLogAttack为从关键视角探测现有及未来NLI模型的能力提供了工具,我们期待更多基于逻辑的攻击方法被用以探索推理的理想属性。