Rigorous evaluation of the causal effects of semantic features on language model predictions can be hard to achieve for natural language reasoning problems. However, this is such a desirable form of analysis from both an interpretability and model evaluation perspective, that it is valuable to investigate specific patterns of reasoning with enough structure and regularity to identify and quantify systematic reasoning failures in widely-used models. In this vein, we pick a portion of the NLI task for which an explicit causal diagram can be systematically constructed: the case where across two sentences (the premise and hypothesis), two related words/terms occur in a shared context. In this work, we apply causal effect estimation strategies to measure the effect of context interventions (whose effect on the entailment label is mediated by the semantic monotonicity characteristic) and interventions on the inserted word-pair (whose effect on the entailment label is mediated by the relation between these words). Extending related work on causal analysis of NLP models in different settings, we perform an extensive interventional study on the NLI task to investigate robustness to irrelevant changes and sensitivity to impactful changes of Transformers. The results strongly bolster the fact that similar benchmark accuracy scores may be observed for models that exhibit very different behaviour. Moreover, our methodology reinforces previously suspected biases from a causal perspective, including biases in favour of upward-monotone contexts and ignoring the effects of negation markers.
翻译:对于自然语言推理问题而言,严格评估语义特征对语言模型预测的因果效应往往难以实现。然而,从可解释性和模型评估两个角度来看,这种分析形式都极具价值,因此有必要研究具有足够结构和规律性的特定推理模式,以识别并量化广泛使用模型中的系统性推理失败。基于此,我们选取了NLI任务中可系统构建显式因果图的部分:即跨两个句子(前提和假设)出现两个相关词语/术语处于共享语境的情况。本研究应用因果效应估计策略,测量语境干预(其对蕴含标签的影响通过语义单调性特征中介)和插入词语对干预(其对蕴含标签的影响通过词语间关系中介)的效应。通过拓展不同场景下NLP模型因果分析的相关工作,我们对NLI任务开展了广泛的干预研究,以探究Transformer对无关变化的鲁棒性及对关键变化的敏感性。结果有力证实,表现截然不同的模型可能观测到相似的基准准确率。此外,我们的方法论从因果视角强化了先前疑似的偏差,包括对向上单调语境的偏好以及忽略否定标记影响的偏差。