Large language models (LLMs) exhibit impressive performance across diverse tasks but often struggle to accurately gauge their knowledge boundaries, leading to confident yet incorrect responses. This paper explores leveraging LLMs' internal states to enhance their perception of knowledge boundaries from efficiency and risk perspectives. We investigate whether LLMs can estimate their confidence using internal states before response generation, potentially saving computational resources. Our experiments on datasets like Natural Questions, HotpotQA, and MMLU reveal that LLMs demonstrate significant pre-generation perception, which is further refined post-generation, with perception gaps remaining stable across varying conditions. To mitigate risks in critical domains, we introduce Confidence Consistency-based Calibration ($C^3$), which assesses confidence consistency through question reformulation. $C^3$ significantly improves LLMs' ability to recognize their knowledge gaps, enhancing the unknown perception rate by 5.6% on NQ and 4.9% on HotpotQA. Our findings suggest that pre-generation confidence estimation can optimize efficiency, while $C^3$ effectively controls output risks, advancing the reliability of LLMs in practical applications.
翻译:大语言模型(LLMs)在多样化任务中展现出令人印象深刻的性能,但常常难以准确评估其知识边界,导致产生自信但错误的回答。本文从效率和风险角度出发,探讨如何利用LLMs的内部状态来增强其对知识边界的感知。我们研究LLMs是否能在生成回答之前,利用内部状态估计其置信度,从而潜在地节省计算资源。我们在Natural Questions、HotpotQA和MMLU等数据集上的实验表明,LLMs展现出显著的事前生成感知能力,该能力在生成后得到进一步细化,且感知差距在不同条件下保持稳定。为降低关键领域的风险,我们引入了基于置信一致性的校准方法($C^3$),该方法通过问题重构来评估置信一致性。$C^3$显著提升了LLMs识别其知识缺口的能力,在NQ和HotpotQA数据集上分别将未知感知率提高了5.6%和4.9%。我们的研究结果表明,事前生成置信度估计可以优化效率,而$C^3$能有效控制输出风险,从而提升LLMs在实际应用中的可靠性。