The increasing use of complex and opaque black box models requires the adoption of interpretable measures, one such option is extractive rationalizing models, which serve as a more interpretable alternative. These models, also known as Explain-Then-Predict models, employ an explainer model to extract rationales and subsequently condition the predictor with the extracted information. Their primary objective is to provide precise and faithful explanations, represented by the extracted rationales. In this paper, we take a semi-supervised approach to optimize for the plausibility of extracted rationales. We adopt a pre-trained natural language inference (NLI) model and further fine-tune it on a small set of supervised rationales ($10\%$). The NLI predictor is leveraged as a source of supervisory signals to the explainer via entailment alignment. We show that, by enforcing the alignment agreement between the explanation and answer in a question-answering task, the performance can be improved without access to ground truth labels. We evaluate our approach on the ERASER dataset and show that our approach achieves comparable results with supervised extractive models and outperforms unsupervised approaches by $> 100\%$.
翻译:随着复杂且不透明的黑盒模型日益广泛的应用,采用可解释性度量手段的需求愈发迫切,抽取式合理化模型便是其中一种更具可解释性的替代方案。这类模型亦被称为"先解释后预测"模型,其通过解释器模型抽取依据性文本片段,随后利用所抽取的信息对预测器进行条件约束。这类模型的核心目标在于提供以抽取片段为代表的精确且忠实解释。本文采用半监督方法优化抽取依据的合理性。我们采用预训练的自然语言推理模型,并利用少量监督式依据样本($10\%$)进行微调。通过蕴含对齐机制,将NLI预测器作为监督信号源提供给解释器。研究表明,在问答任务中通过强化解释与答案之间的对齐一致性,即使不依赖真实标签也能提升模型性能。我们在ERASER数据集上评估了本方法,结果表明:该方法与监督式抽取模型取得相当性能,且以超过$100\%$的优势显著优于无监督方法。