State-of-the-art neural models can now reach human performance levels across various natural language understanding tasks. However, despite this impressive performance, models are known to learn from annotation artefacts at the expense of the underlying task. While interpretability methods can identify influential features for each prediction, there are no guarantees that these features are responsible for the model decisions. Instead, we introduce a model-agnostic logical framework to determine the specific information in an input responsible for each model decision. This method creates interpretable Natural Language Inference (NLI) models that maintain their predictive power. We achieve this by generating facts that decompose complex NLI observations into individual logical atoms. Our model makes predictions for each atom and uses logical rules to decide the class of the observation based on the predictions for each atom. We apply our method to the highly challenging ANLI dataset, where our framework improves the performance of both a DeBERTa-base and BERT baseline. Our method performs best on the most challenging examples, achieving a new state-of-the-art for the ANLI round 3 test set. We outperform every baseline in a reduced-data setting, and despite using no annotations for the generated facts, our model predictions for individual facts align with human expectations.
翻译:最先进的神经模型现已在多种自然语言理解任务中达到人类水平。然而,尽管性能令人印象深刻,已知模型会以牺牲底层任务为代价,从标注伪影中学习。虽然可解释性方法可以识别每次预测中的关键特征,但无法保证这些特征对模型决策负责。为此,我们引入了一种模型无关的逻辑框架,以确定输入中负责每个模型决策的特定信息。该方法创建了可解释的自然语言推理(NLI)模型,同时保持其预测能力。我们通过生成将复杂NLI观测分解为单个逻辑原子的事实来实现这一点。我们的模型对每个原子进行预测,并使用逻辑规则基于每个原子的预测决定观测的类别。我们将该方法应用于极具挑战性的ANLI数据集,我们的框架提升了DeBERTa-base和BERT基线的性能。我们的方法在最具挑战性的样本上表现最佳,在ANLI第3轮测试集上达到了新的最先进水平。在数据缩减设置中,我们优于所有基线,尽管未对生成的事实使用任何标注,但模型对单个事实的预测与人类期望一致。