Large Language Models (LLMs) have recently shown strong performance on Entity Resolution (ER). Additionally, akin to their prowess in providing accurate predictions, these models often generate self-explanations alongside their predictions through prompting. While such self-explanations are appealing due to their negligible computational cost, their actual reliability remains largely unexplored. In this paper, we present the first large-scale systematic evaluation of LLM self-explanations for ER, focusing on feature attribution and counterfactual explanations at both the attribute and token levels. Across three LLMs, ten datasets, and multiple prompting strategies, we show that self-explanations are often unstable, weakly faithful, and poorly aligned with counterfactual evidence, revealing a substantial gap between plausibility and causal relevance. We further demonstrate that established post-hoc explanation methods provide significantly higher trustworthiness, but at a prohibitive computational cost when applied to LLMs. To bridge this gap, we introduce \uncerta{}, a hybrid explanation framework that leverages self-explanations as priors to guide post-hoc exploration. \uncerta{} achieves explanation quality comparable to post-hoc methods while reducing cost by up to an order of magnitude.
翻译:大语言模型(LLMs)近期在实体解析(ER)任务上展现出强劲性能。此外,与它们提供准确预测的能力类似,这些模型常通过提示机制在预测时生成自我解释。尽管此类自我解释因其极低计算成本而颇具吸引力,但其实际可靠性仍鲜有探究。本文首次对用于实体解析的 LLM 自我解释进行大规模系统性评估,聚焦于属性级和词元级的特征归因与反事实解释。通过对三种LLM、十个数据集及多种提示策略的评估,我们发现自我解释常呈现不稳定性、弱忠实性,且与反事实证据对齐性差,揭示了其表面合理性因果相关性之间的显著鸿沟。我们进一步证明,既有的事后解释方法能提供显著更高的可信度,但当应用于LLM时会产生高昂的计算代价。为弥合这一差距,我们提出\uncerta{}——一种混合解释框架,该框架利用自我解释作为先验引导事后探索。\uncerta{}在达到与事后方法相当的解释质量的同时,将计算成本降低了一个数量级。