Can AI Assistants Know What They Don't Know?

Recently, AI assistants based on large language models (LLMs) show surprising performance in many tasks, such as dialogue, solving math problems, writing code, and using tools. Although LLMs possess intensive world knowledge, they still make factual errors when facing some knowledge intensive tasks, like open-domain question answering. These untruthful responses from the AI assistant may cause significant risks in practical applications. We believe that an AI assistant's refusal to answer questions it does not know is a crucial method for reducing hallucinations and making the assistant truthful. Therefore, in this paper, we ask the question "Can AI assistants know what they don't know and express them through natural language?" To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets. Then we align the assistant with its corresponding Idk dataset and observe whether it can refuse to answer its unknown questions after alignment. Experimental results show that after alignment with Idk datasets, the assistant can refuse to answer most its unknown questions. For questions they attempt to answer, the accuracy is significantly higher than before the alignment.

翻译：近来，基于大语言模型的AI助手在对话、数学问题求解、代码编写和工具使用等多项任务中展现出令人瞩目的性能。尽管大语言模型具备丰富的世界知识，但在处理开放域问答等知识密集型任务时仍会出现事实性错误。AI助手这类不真实的回答可能在实际应用中引发重大风险。我们认为，AI助手拒绝对其未知问题进行回答是减少幻觉、提升真实性的关键方法。为此，本文提出"AI助手能否知道自己不知道什么，并通过自然语言表达这种认知？"这一研究问题。为回答该问题，我们基于现有开放域问答数据集，为特定AI助手构建了模型专属的"我不知道"数据集，其中包含该助手已知与未知的问题。随后，我们通过该数据集对助手进行对齐，并观察对齐后助手能否拒绝对其未知问题进行回答。实验结果表明，经过"I不知道"数据集对齐后，AI助手能够拒绝对绝大多数未知问题进行回答。对于其尝试作答的问题，回答准确率显著高于对齐前。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日