Large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, enabling them to answer a wide range of questions across various domains. However, these models are not flawless and often produce responses that contain errors or misinformation. These inaccuracies, commonly referred to as hallucinations, render LLMs unreliable and even unusable in many scenarios. In this paper, our focus is on mitigating the issue of hallucination in LLMs, particularly in the context of question-answering. Instead of attempting to answer all questions, we explore a refusal mechanism that instructs LLMs to refuse to answer challenging questions in order to avoid errors. We then propose a simple yet effective solution called Learn to Refuse (L2R), which incorporates the refusal mechanism to enable LLMs to recognize and refuse to answer questions that they find difficult to address. To achieve this, we utilize a structured knowledge base to represent all the LLM's understanding of the world, enabling it to provide traceable gold knowledge. This knowledge base is separate from the LLM and initially empty. It can be filled with validated knowledge and progressively expanded. When an LLM encounters questions outside its domain, the system recognizes its knowledge scope and determines whether it can answer the question independently. Additionally, we introduce a method for automatically and efficiently expanding the knowledge base of LLMs. Through qualitative and quantitative analysis, we demonstrate that our approach enhances the controllability and reliability of LLMs.
翻译:大语言模型(LLMs)已展现出卓越的语言理解与生成能力,使其能够回答跨领域的广泛问题。然而,这些模型并非完美,其生成的响应常包含错误或误导性信息。这类通常被称为“幻觉”的不准确性,导致LLMs在许多场景下不可靠甚至无法使用。本文聚焦于缓解LLMs中的幻觉问题,特别是在问答任务背景下。我们并未试图回答所有问题,而是探索一种拒绝机制,指导LLMs通过拒绝回答具有挑战性的问题来避免错误。继而提出一种简洁而有效的解决方案——“学会拒绝”(L2R),该方案融合拒绝机制,使LLMs能够识别并拒绝回答其难以处理的问题。为实现这一目标,我们利用结构化知识库来表征LLM对世界的全部认知,使其能够提供可追溯的黄金知识。该知识库独立于LLM且初始为空,可通过已验证知识进行填充并逐步扩展。当LLM遇到其领域外的问题时,系统能识别其知识范围并自主判断是否能够回答问题。此外,我们引入了一种自动高效扩展LLM知识库的方法。通过定性与定量分析,我们证明该方法有效增强了LLMs的可控性与可靠性。