Can AI Assistants Know What They Don't Know?

Recently, AI assistants based on large language models (LLMs) show surprising performance in many tasks, such as dialogue, solving math problems, writing code, and using tools. Although LLMs possess intensive world knowledge, they still make factual errors when facing some knowledge intensive tasks, like open-domain question answering. These untruthful responses from the AI assistant may cause significant risks in practical applications. We believe that an AI assistant's refusal to answer questions it does not know is a crucial method for reducing hallucinations and making the assistant truthful. Therefore, in this paper, we ask the question "Can AI assistants know what they don't know and express them through natural language?" To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets. Then we align the assistant with its corresponding Idk dataset and observe whether it can refuse to answer its unknown questions after alignment. Experimental results show that after alignment with Idk datasets, the assistant can refuse to answer most its unknown questions. For questions they attempt to answer, the accuracy is significantly higher than before the alignment.

翻译：近期，基于大语言模型（LLMs）的AI助手在多轮对话、数学求解、代码编写及工具使用等任务中展现出惊人性能。尽管LLMs拥有丰富的世界知识，但在处理开放域问答等知识密集型任务时仍会出现事实性错误。AI助手的此类虚假应答在实际应用中可能引发重大风险。我们认为，AI助手对自身未知问题进行拒答是减少幻觉现象、确保应答真实性的关键手段。因此，本文提出疑问：“AI助手能否认知其知识边界并通过自然语言表达这种不确定性？”为解答此问题，我们基于现有开放域问答数据集，为助手构建了包含已知与未知问题的模型专属“我不知道（Idk）”数据集。随后将助手与该Idk数据集进行对齐训练，观察其是否能拒绝对未知问题进行应答。实验结果表明：经过Idk数据集对齐后，助手能够拒绝对大部分未知问题进行应答；在其尝试回答的问题中，准确率较对齐前显著提升。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日