Large language models (LLMs) have made significant progress in NLP. However, their ability to memorize, represent, and leverage commonsense knowledge has been a well-known pain point. In this paper, we specifically focus on ChatGPT, a widely used and easily accessible LLM, and ask the following questions: (1) Can ChatGPT effectively answer commonsense questions? (2) Is ChatGPT aware of the underlying commonsense knowledge for answering a specific question? (3) Is ChatGPT knowledgeable in commonsense? (4) Can ChatGPT effectively leverage commonsense for answering questions? We conduct a series of experiments on 11 datasets to evaluate ChatGPT's commonsense abilities, including answering commonsense questions, identifying necessary knowledge, generating knowledge descriptions, and using knowledge descriptions to answer questions again. Experimental results show that: (1) ChatGPT can achieve good QA accuracies in commonsense tasks, while still struggling with certain domains of datasets. (2) ChatGPT is knowledgeable, and can accurately generate most of the commonsense knowledge using knowledge prompts. (3) Despite its knowledge, ChatGPT is an inexperienced commonsense problem solver, which cannot precisely identify the needed commonsense for answering a specific question. These findings raise the need to explore improved mechanisms for effectively incorporating commonsense into LLMs like ChatGPT, such as better instruction following and commonsense guidance.
翻译:大型语言模型(LLMs)在自然语言处理领域取得了显著进展。然而,它们在记忆、表征和利用常识知识方面的能力一直是一个公认的痛点。本文专门聚焦于广泛使用且易于获取的LLM——ChatGPT,并提出以下问题:(1)ChatGPT能否有效回答常识性问题?(2)ChatGPT是否知晓回答特定问题所需的基础常识知识?(3)ChatGPT在常识方面是否知识渊博?(4)ChatGPT能否有效利用常识来回答问题?我们在11个数据集上开展了一系列实验,以评估ChatGPT的常识能力,包括回答常识性问题、识别必要知识、生成知识描述以及利用知识描述重新回答问题。实验结果表明:(1)ChatGPT在常识任务中能取得较好的问答准确率,但在某些领域的数据集上仍存在困难。(2)ChatGPT知识渊博,能够通过知识提示准确生成大部分常识知识。(3)尽管具备知识,ChatGPT却是一个经验不足的常识问题求解者,无法精确识别回答特定问题所需的常识。这些发现表明,需探索改进机制以将常识有效融入ChatGPT等LLM,例如提供更好的指令遵循和常识引导。