Recent advancements in Large Language Models, such as ChatGPT, have demonstrated significant potential to impact various aspects of human life. However, ChatGPT still faces challenges in aspects like faithfulness. Taking question answering as a representative application, we seek to understand why ChatGPT falls short in answering questions faithfully. To address this question, we attempt to analyze the failures of ChatGPT in complex open-domain question answering and identifies the abilities under the failures. Specifically, we categorize ChatGPT's failures into four types: comprehension, factualness, specificity, and inference. We further pinpoint three critical abilities associated with QA failures: knowledge memorization, knowledge association, and knowledge reasoning. Additionally, we conduct experiments centered on these abilities and propose potential approaches to enhance faithfulness. The results indicate that furnishing the model with fine-grained external knowledge, hints for knowledge association, and guidance for reasoning can empower the model to answer questions more faithfully.
翻译:近期,以ChatGPT为代表的大型语言模型在影响人类生活各个层面展现出巨大潜力。然而,ChatGPT在忠实性等方面仍面临挑战。以问答任务为典型应用场景,我们致力于探究ChatGPT为何难以忠实回答问题。针对这一问题,我们尝试分析ChatGPT在复杂开放域问答中的失败案例,并识别导致这些失败的能力缺陷。具体而言,我们将ChatGPT的失败归为四类:理解偏差、事实性错误、特异性不足和推理缺陷。进一步地,我们确定了与问答失败相关的三种关键能力:知识记忆、知识关联和知识推理。此外,我们围绕这些能力展开实验,并提出提升忠实性的潜在方法。结果表明,为模型提供细粒度外部知识、知识关联线索及推理指导,能够增强模型回答问题的忠实度。