Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount of factual knowledge and often rely on external information for assistance. Recently, large language models (LLMs) (e.g., ChatGPT), have demonstrated impressive prowess in solving a wide range of tasks with world knowledge, including knowledge-intensive tasks. However, it remains unclear how well LLMs are able to perceive their factual knowledge boundaries, particularly how they behave when incorporating retrieval augmentation. In this study, we present an initial analysis of the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain QA. Specially, we focus on three primary research questions and analyze them by examining QA performance, priori judgement and posteriori judgement of LLMs. We show evidence that LLMs possess unwavering confidence in their capabilities to respond to questions and the accuracy of their responses. Furthermore, retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries, thereby improving their judgemental abilities. Additionally, we also find that LLMs have a propensity to rely on the provided retrieval results when formulating answers, while the quality of these results significantly impacts their reliance. The code to reproduce this work is available at https://github.com/RUCAIBox/LLM-Knowledge-Boundary.
翻译:知识密集型任务(例如开放域问答)需要大量事实知识,并常借助外部信息辅助完成。近期,大语言模型(如ChatGPT)凭借其世界知识在包括知识密集型任务在内的广泛任务中展现出卓越能力。然而,大语言模型如何感知自身事实知识边界,尤其是引入检索增强后其行为模式如何变化,目前尚不明确。本研究对LLM的事实知识边界及检索增强如何影响其在开放域问答中的表现进行了初步分析。具体而言,我们聚焦三个核心研究问题,通过考察LLM的问答表现、先验判断与后验判断进行分析。实验证据表明,LLM对自身回答问题的能力及回答的准确性具有坚定信心。此外,检索增强被证明是提升LLM知识边界意识的有效手段,从而增强其判断能力。同时发现,LLM在构建答案时倾向于依赖提供的检索结果,而检索结果的质量显著影响其依赖程度。本研究的可复现代码已发布于https://github.com/RUCAIBox/LLM-Knowledge-Boundary。