Large language models (LLMs) have made significant progress in NLP. However, their ability to memorize, represent, and leverage commonsense knowledge has been a well-known pain point. In this paper, we specifically focus on ChatGPT, a widely used and easily accessible LLM, and ask the following questions: (1) Can ChatGPT effectively answer commonsense questions? (2) Is ChatGPT aware of the underlying commonsense knowledge for answering a specific question? (3) Is ChatGPT knowledgeable in commonsense? (4) Can ChatGPT effectively leverage commonsense for answering questions? We conduct a series of experiments on 11 datasets to evaluate ChatGPT's commonsense abilities, including answering commonsense questions, identifying necessary knowledge, generating knowledge descriptions, and using knowledge descriptions to answer questions again. Experimental results show that: (1) ChatGPT can achieve good QA accuracies in commonsense tasks, while still struggling with certain domains of datasets. (2) ChatGPT is knowledgeable, and can accurately generate most of the commonsense knowledge using knowledge prompts. (3) Despite its knowledge, ChatGPT is an inexperienced commonsense problem solver, which cannot precisely identify the needed commonsense for answering a specific question. These findings raise the need to explore improved mechanisms for effectively incorporating commonsense into LLMs like ChatGPT, such as better instruction following and commonsense guidance.
翻译:大型语言模型(LLMs)在自然语言处理领域取得了显著进展。然而,其在记忆、表征与运用常识知识方面的能力一直是广受关注的痛点。本文聚焦于广泛使用且易于获取的ChatGPT,提出以下问题:(1)ChatGPT能否有效回答常识性问题?(2)ChatGPT能否感知回答特定问题所需的潜在常识知识?(3)ChatGPT是否具备丰富的常识知识储备?(4)ChatGPT能否有效运用常识回答提问?我们基于11个数据集设计系列实验,从回答常识问题、识别必要知识、生成知识描述、利用知识描述重新回答问题等维度评估ChatGPT的常识能力。实验结果表明:(1)ChatGPT在常识任务中能达到较优的问答准确率,但在特定领域数据集上仍存在局限;(2)ChatGPT具备知识储备,能通过知识提示准确生成大部分常识知识;(3)尽管拥有知识,ChatGPT仍是经验不足的常识问题解决者,无法精准识别回答特定问题所需的常识。这些发现提示亟需探索改进机制,以有效增强ChatGPT等大型语言模型对常识的整合能力,例如优化指令遵循与常识引导策略。