Rigorous evaluation of the causal effects of semantic features on language model predictions can be hard to achieve for natural language reasoning problems. However, this is such a desirable form of analysis from both an interpretability and model evaluation perspective, that it is valuable to zone in on specific patterns of reasoning with enough structure and regularity to be able to identify and quantify systematic reasoning failures in widely-used models. In this vein, we pick a portion of the NLI task for which an explicit causal diagram can be systematically constructed: in particular, the case where across two sentences (the premise and hypothesis), two related words/terms occur in a shared context. In this work, we apply causal effect estimation strategies to measure the effect of context interventions (whose effect on the entailment label is mediated by the semantic monotonicity characteristic) and interventions on the inserted word-pair (whose effect on the entailment label is mediated by the relation between these words.). Following related work on causal analysis of NLP models in different settings, we adapt the methodology for the NLI task to construct comparative model profiles in terms of robustness to irrelevant changes and sensitivity to impactful changes.
翻译:对自然语言推理问题而言,严格评估语义特征对语言模型预测的因果效应往往难以实现。然而,从可解释性和模型评估角度来看,这种分析形式颇具价值,因此有必要聚焦于具有足够结构和规律性的特定推理模式,以便识别并量化广泛使用模型中的系统性推理失误。基于此,我们选取自然语言推理(NLI)任务中一个可系统构建显式因果图的部分:具体而言,即两个句子(前提和假设)中两个相关词/术语出现在共享语境中的情况。在本研究中,我们应用因果效应估计策略来测量语境干预(其对蕴涵标签的影响由语义单调性特征介导)以及插入词对干预(其对蕴涵标签的影响由这些词之间的关系介导)的效应。遵循不同环境下NLP模型因果分析的相关研究,我们针对NLI任务调整方法,以构建鲁棒性(对无关变化的抵抗能力)和敏感性(对重要变化的响应能力)方面的模型对比画像。