Quantifying the uncertainty in the factual parametric knowledge of Large Language Models (LLMs), especially in a black-box setting, poses a significant challenge. Existing methods, which gauge a model's uncertainty through evaluating self-consistency in responses to the original query, do not always capture true uncertainty. Models might respond consistently to the origin query with a wrong answer, yet respond correctly to varied questions from different perspectives about the same query, and vice versa. In this paper, we propose a novel method, DiverseAgentEntropy, for evaluating a model's uncertainty using multi-agent interaction under the assumption that if a model is certain, it should consistently recall the answer to the original query across a diverse collection of questions about the same original query. We further implement an abstention policy to withhold responses when uncertainty is high. Our method offers a more accurate prediction of the model's reliability and further detects hallucinations, outperforming other self-consistency-based methods. Additionally, it demonstrates that existing models often fail to consistently retrieve the correct answer to the same query under diverse varied questions even when knowing the correct answer.
翻译:量化大型语言模型(LLM)事实性参数知识的不确定性,尤其是在黑盒设置下,是一个重大挑战。现有方法通过评估模型对原始查询响应的自一致性来度量其不确定性,但这种方法并不总能捕捉到真实的不确定性。模型可能对原始查询始终给出错误答案,却能从不同视角对同一查询的多样化问题给出正确答案,反之亦然。本文提出一种新方法DiverseAgentEntropy,基于多智能体交互来评估模型的不确定性,其核心假设是:若模型对某知识具有确定性,则其应能在一系列针对同一原始查询的多样化问题中,始终一致地回忆起原始查询的答案。我们进一步实现了弃权策略,在不确定性较高时拒绝生成响应。本方法能更准确地预测模型的可靠性,并有效检测幻觉现象,其性能优于其他基于自一致性的方法。此外,实验表明现有模型即使知道正确答案,也往往无法在针对同一查询的多样化问题中始终保持答案的一致性。