Large Language Models (LLMs) are increasingly used as powerful tools for several high-stakes natural language processing (NLP) applications. Recent prompting works claim to elicit intermediate reasoning steps and key tokens that serve as proxy explanations for LLM predictions. However, there is no certainty whether these explanations are reliable and reflect the LLMs behavior. In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs. To this end, we propose two novel metrics -- $\textit{Verbalized Uncertainty}$ and $\textit{Probing Uncertainty}$ -- to quantify the uncertainty of generated explanations. While verbalized uncertainty involves prompting the LLM to express its confidence in its explanations, probing uncertainty leverages sample and model perturbations as a means to quantify the uncertainty. Our empirical analysis of benchmark datasets reveals that verbalized uncertainty is not a reliable estimate of explanation confidence. Further, we show that the probing uncertainty estimates are correlated with the faithfulness of an explanation, with lower uncertainty corresponding to explanations with higher faithfulness. Our study provides insights into the challenges and opportunities of quantifying uncertainty in LLM explanations, contributing to the broader discussion of the trustworthiness of foundation models.
翻译:大型语言模型(LLMs)在多个高风险自然语言处理(NLP)应用中日益被用作强大工具。近期基于提示(prompting)的研究声称,通过引导中间推理步骤和关键标记(tokens),可获得作为LLM预测代理解释的依据。然而,这些解释是否可靠并反映LLM的行为尚无定论。在本工作中,我们首次尝试量化LLM解释中的不确定性。为此,我们提出两项新指标——$\textit{口头不确定性}$(Verbalized Uncertainty)与$\textit{探针不确定性}$(Probing Uncertainty)——用于量化生成解释的不确定性。口头不确定性通过提示LLM表达其对解释的置信度来获取,而探针不确定性则利用样本扰动与模型扰动作为量化不确定性手段。我们对基准数据集的实证分析表明,口头不确定性并非衡量解释置信度的可靠估计。此外,我们证明探针不确定性估计与解释的忠实性(faithfulness)相关:较低不确定性对应更高忠实性的解释。本研究为量化LLM解释中不确定性的挑战与机遇提供了见解,并推动了对基础模型可信度的广泛讨论。