Free-text rationales play a pivotal role in explainable NLP, bridging the knowledge and reasoning gaps behind a model's decision-making. However, due to the diversity of potential reasoning paths and a corresponding lack of definitive ground truth, their evaluation remains a challenge. Existing evaluation metrics rely on the degree to which a rationale supports a target label, but we find these fall short in evaluating rationales that inadvertently leak the labels. To address this problem, we propose RORA, a Robust free-text Rationale evaluation against label leakage. RORA quantifies the new information supplied by a rationale to justify the label. This is achieved by assessing the conditional V-information \citep{hewitt-etal-2021-conditional} with a predictive family robust against leaky features that can be exploited by a small model. RORA consistently outperforms existing approaches in evaluating human-written, synthetic, or model-generated rationales, particularly demonstrating robustness against label leakage. We also show that RORA aligns well with human judgment, providing a more reliable and accurate measurement across diverse free-text rationales.
翻译:自由文本理由在可解释自然语言处理中扮演着关键角色,它们弥合了模型决策背后的知识鸿沟与推理空缺。然而,由于潜在推理路径的多样性以及相应缺乏明确的真实标注,其评估仍是一项挑战。现有评估指标依赖于理由对目标标签的支持程度,但我们发现这些指标在评估无意中泄露标签的理由时存在不足。为解决这一问题,我们提出RORA——一种针对标签泄露问题的鲁棒自由文本理由评估方法。RORA通过量化理由为证明标签所提供的新信息来实现评估,具体采用条件V-信息(引用自hewitt-etal-2021-conditional)评估框架,并利用对泄露特征具有鲁棒性的预测族(该预测族可被小型模型利用)进行评估。在人工撰写、合成或模型生成的理由评估中,RORA始终优于现有方法,尤其展现出对标签泄露的鲁棒性。我们还证明RORA与人类判断高度一致,能够为多样化的自由文本理由提供更可靠、更准确的评估。