Natural language explanations (NLEs) are commonly used to provide plausible free-text explanations of a model's reasoning about its predictions. However, recent work has questioned their faithfulness, as they may not accurately reflect the model's internal reasoning process regarding its predicted answer. In contrast, highlight explanations--input fragments critical for the model's predicted answers--exhibit measurable faithfulness. Building on this foundation, we propose G-Tex, a Graph-Guided Textual Explanation Generation framework designed to enhance the faithfulness of NLEs. Specifically, highlight explanations are first extracted as faithful cues reflecting the model's reasoning logic toward answer prediction. They are subsequently encoded through a graph neural network layer to guide the NLE generation, which aligns the generated explanations with the model's underlying reasoning toward the predicted answer. Experiments on T5 and BART using three reasoning datasets show that G-Tex improves NLE faithfulness by up to 12.18% compared to baseline methods. Additionally, G-Tex generates NLEs with greater semantic and lexical similarity to human-written ones. Human evaluations show that G-Tex can decrease redundant content and enhance the overall quality of NLEs. Our work presents a novel method for explicitly guiding NLE generation to enhance faithfulness, serving as a foundation for addressing broader criteria in NLE and generated text.
翻译:自然语言解释(NLEs)通常用于为模型对其预测的推理过程提供合理的自由文本解释。然而,近期研究对其忠实性提出了质疑,因为它们可能无法准确反映模型关于其预测答案的内部推理过程。相比之下,高亮解释——即对模型预测答案至关重要的输入片段——则表现出可衡量的忠实性。基于此,我们提出了G-Tex,一个旨在提升NLEs忠实性的图引导文本解释生成框架。具体而言,首先提取高亮解释作为反映模型答案预测推理逻辑的忠实线索。随后通过图神经网络层对其进行编码,以指导NLE的生成,从而使生成的解释与模型对预测答案的底层推理过程保持一致。在三个推理数据集上对T5和BART进行的实验表明,与基线方法相比,G-Tex将NLE的忠实性最高提升了12.18%。此外,G-Tex生成的NLE在语义和词汇上与人工撰写的解释具有更高的相似性。人工评估表明,G-Tex能够减少冗余内容并提升NLE的整体质量。我们的工作提出了一种显式引导NLE生成以增强忠实性的新方法,为满足NLE及生成文本更广泛的标准奠定了基础。