Can large language models (LLMs) express their uncertainty in situations where they lack sufficient parametric knowledge to generate reasonable responses? This work aims to systematically investigate LLMs' behaviors in such situations, emphasizing the trade-off between honesty and helpfulness. To tackle the challenge of precisely determining LLMs' knowledge gaps, we diagnostically create unanswerable questions containing non-existent concepts or false premises, ensuring that they are outside the LLMs' vast training data. By compiling a benchmark, UnknownBench, which consists of both unanswerable and answerable questions, we quantitatively evaluate the LLMs' performance in maintaining honesty while being helpful. Using a model-agnostic unified confidence elicitation approach, we observe that most LLMs fail to consistently refuse or express uncertainty towards questions outside their parametric knowledge, although instruction fine-tuning and alignment techniques can provide marginal enhancements. Moreover, LLMs' uncertainty expression does not always stay consistent with the perceived confidence of their textual outputs.
翻译:摘要:当大语言模型缺乏足够的参数知识来生成合理回应时,它们能否表达自身的不确定性?本研究旨在系统性地探究大语言模型在此类情境中的行为,重点关注诚实性与有用性之间的权衡。为精准界定大语言模型的知识缺口,我们诊断性地构建了包含不存在概念或虚假前提的不可回答问题,确保这些问题超出其海量训练数据覆盖范围。通过编制包含不可回答与可回答问题的基准测试集UnknownBench,我们定量评估了大语言模型在保持诚实性同时展现有用性的表现。采用模型无关的统一置信度诱导方法后,我们发现大多数大语言模型未能持续拒答或对参数知识范围外的问题表达不确定性,尽管指令微调与对齐技术可带来边际性改善。此外,大语言模型的不确定性表达与其文本输出所呈现的感知置信度并不总保持一致。