Recent advancements in large language models, such as ChatGPT, have demonstrated significant potential to impact various aspects of human life. However, ChatGPT still faces challenges in providing reliable and accurate answers to user questions. To better understand the model's particular weaknesses in providing truthful answers, we embark an in-depth exploration of open-domain question answering. Specifically, we undertake a detailed examination of ChatGPT's failures, categorized into: comprehension, factuality, specificity, and inference. We further pinpoint factuality as the most contributing failure and identify two critical abilities associated with factuality: knowledge memorization and knowledge recall. Through experiments focusing on factuality, we propose several potential enhancement strategies. Our findings suggest that augmenting the model with granular external knowledge and cues for knowledge recall can enhance the model's factuality in answering questions.
翻译:近年来,以ChatGPT为代表的大语言模型取得了显著进展,展现出对人类生活各个方面产生重要影响的潜力。然而,ChatGPT在为用户问题提供可靠准确的答案方面仍面临挑战。为深入理解该模型在提供真实答案方面的特定缺陷,我们对开放域问答进行了深入探究。具体而言,我们对ChatGPT的失败案例进行了详细分类,包括:理解性、事实性、特异性和推理能力。我们进一步确定事实性是导致失败的最主要因素,并识别出与事实性相关的两项关键能力:知识记忆与知识回忆。通过聚焦事实性的实验,我们提出了若干潜在增强策略。研究结果表明,向模型补充细粒度外部知识及知识回忆线索,能够提升模型在回答问题时的真实性。