Recently, AI assistants based on large language models (LLMs) show surprising performance in many tasks, such as dialogue, solving math problems, writing code, and using tools. Although LLMs possess intensive world knowledge, they still make factual errors when facing some knowledge intensive tasks, like open-domain question answering. These untruthful responses from the AI assistant may cause significant risks in practical applications. We believe that an AI assistant's refusal to answer questions it does not know is a crucial method for reducing hallucinations and making the assistant truthful. Therefore, in this paper, we ask the question "Can AI assistants know what they don't know and express them through natural language?" To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets. Then we align the assistant with its corresponding Idk dataset and observe whether it can refuse to answer its unknown questions after alignment. Experimental results show that after alignment with Idk datasets, the assistant can refuse to answer most its unknown questions. For questions they attempt to answer, the accuracy is significantly higher than before the alignment.
翻译:近期,基于大语言模型(LLMs)的AI助手在多轮对话、数学求解、代码编写及工具使用等任务中展现出惊人性能。尽管LLMs拥有丰富的世界知识,但在处理开放域问答等知识密集型任务时仍会出现事实性错误。AI助手的此类虚假应答在实际应用中可能引发重大风险。我们认为,AI助手对自身未知问题进行拒答是减少幻觉现象、确保应答真实性的关键手段。因此,本文提出疑问:“AI助手能否认知其知识边界并通过自然语言表达这种不确定性?”为解答此问题,我们基于现有开放域问答数据集,为助手构建了包含已知与未知问题的模型专属“我不知道(Idk)”数据集。随后将助手与该Idk数据集进行对齐训练,观察其是否能拒绝对未知问题进行应答。实验结果表明:经过Idk数据集对齐后,助手能够拒绝对大部分未知问题进行应答;在其尝试回答的问题中,准确率较对齐前显著提升。