Why Does ChatGPT Fall Short in Answering Questions Faithfully? - 专知论文

会员服务 ·

0

ChatGPT · 知识 · 自动问答 · 问答 · 关联 ·

2023 年 4 月 20 日

Why Does ChatGPT Fall Short in Answering Questions Faithfully?

翻译：为何ChatGPT在忠实回答问题方面表现不佳？

Shen Zheng,Jie Huang,Kevin Chen-Chuan Chang

from arxiv, Preprint in progress

Recent advancements in Large Language Models, such as ChatGPT, have demonstrated significant potential to impact various aspects of human life. However, ChatGPT still faces challenges in aspects like faithfulness. Taking question answering as a representative application, we seek to understand why ChatGPT falls short in answering questions faithfully. To address this question, we attempt to analyze the failures of ChatGPT in complex open-domain question answering and identifies the abilities under the failures. Specifically, we categorize ChatGPT's failures into four types: comprehension, factualness, specificity, and inference. We further pinpoint three critical abilities associated with QA failures: knowledge memorization, knowledge association, and knowledge reasoning. Additionally, we conduct experiments centered on these abilities and propose potential approaches to enhance faithfulness. The results indicate that furnishing the model with fine-grained external knowledge, hints for knowledge association, and guidance for reasoning can empower the model to answer questions more faithfully.

翻译：近期，以ChatGPT为代表的大型语言模型在影响人类生活各个层面展现出巨大潜力。然而，ChatGPT在忠实性等方面仍面临挑战。以问答任务为典型应用场景，我们致力于探究ChatGPT为何难以忠实回答问题。针对这一问题，我们尝试分析ChatGPT在复杂开放域问答中的失败案例，并识别导致这些失败的能力缺陷。具体而言，我们将ChatGPT的失败归为四类：理解偏差、事实性错误、特异性不足和推理缺陷。进一步地，我们确定了与问答失败相关的三种关键能力：知识记忆、知识关联和知识推理。此外，我们围绕这些能力展开实验，并提出提升忠实性的潜在方法。结果表明，为模型提供细粒度外部知识、知识关联线索及推理指导，能够增强模型回答问题的忠实度。

1

相关内容

ChatGPT

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

ChatGPT懂常识吗？中科院等最新《ChatGPT是一个有知识但没有经验的求解器:大型语言模型常识问题的研究》论文，

ChatGPT懂常识吗？中科院等最新《ChatGPT是一个有知识但没有经验的求解器:大型语言模型常识问题的研究》论文，

专知会员服务

81+阅读 · 2023年4月5日

最新《文本简化》综述论文，26页pdf，A Survey on Text Simplification

最新《文本简化》综述论文，26页pdf，A Survey on Text Simplification

专知会员服务

15+阅读 · 2020年8月26日

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

74+阅读 · 2020年7月28日

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

专知会员服务

90+阅读 · 2020年7月9日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

ChatGPT懂常识吗？中科院等最新《ChatGPT是一个有知识但没有经验的求解器:大型语言模型常识问题的研究》论文解答

ChatGPT懂常识吗？中科院等最新《ChatGPT是一个有知识但没有经验的求解器:大型语言模型常识问题的研究》论文解答

专知

5+阅读 · 2023年4月5日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

11篇ICLR2020满分文章，来看看他们都在做什么？

11篇ICLR2020满分文章，来看看他们都在做什么？

专知

18+阅读 · 2019年11月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

专知

18+阅读 · 2018年2月22日

幼龄梅花鹿瘤胃微生物定植、起源及介导的代谢表型的研究

国家自然科学基金

0+阅读 · 2015年12月31日

地理信息检索中语境的获取、推理及应用

国家自然科学基金

6+阅读 · 2012年12月31日

脂肪酸ω-羟化酶催化特异性研究

国家自然科学基金

0+阅读 · 2012年12月31日

坛紫菜高温胁迫应答的定量差异蛋白质组学研究

国家自然科学基金

0+阅读 · 2012年12月31日

可压缩Navier-Stokes方程全局光滑解的适定性问题

国家自然科学基金

0+阅读 · 2012年12月31日

坛紫菜应答高温胁迫分子机制的转录组学分析及相关候选基因克隆

国家自然科学基金

0+阅读 · 2011年12月31日

自适应移动网格方法模拟有限时间爆破解的理论研究和应用

国家自然科学基金

0+阅读 · 2011年12月31日

Navier-Stokes方程解的适定性和粘性消失问题

国家自然科学基金

0+阅读 · 2011年12月31日

Period2基因调控人胶质瘤细胞凋亡的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

低浓度紫杉醇诱导KM12C细胞中血管新生抑制因子凝血酶敏感蛋白-1表达的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Structured Knowledge Grounding for Question Answering

Arxiv

0+阅读 · 2023年6月5日

Easy-to-Read in Germany: A Survey on its Current State and Available Resources

Arxiv

0+阅读 · 2023年6月5日

Computing Education in the Era of Generative AI

Arxiv

1+阅读 · 2023年6月5日

Building a Credible Case for Safety: Waymo's Approach for the Determination of Absence of Unreasonable Risk

Arxiv

0+阅读 · 2023年6月2日

ChatGPT is a Remarkable Tool -- For Experts

Arxiv

1+阅读 · 2023年6月2日

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

Arxiv

12+阅读 · 2023年4月26日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

Generalizing to Unseen Domains: A Survey on Domain Generalization

Arxiv

30+阅读 · 2021年3月10日

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

Arxiv

10+阅读 · 2020年12月31日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

VIP会员

文章信息

相关主题

最新内容

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

专知会员服务

6+阅读 · 7月24日

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

专知会员服务

4+阅读 · 7月24日

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

专知会员服务

3+阅读 · 7月24日

乌克兰纵深打击如何重塑俄罗斯的战略选择

乌克兰纵深打击如何重塑俄罗斯的战略选择

专知会员服务

2+阅读 · 7月24日

《分布式太空任务对比分析与综合建模及仿真环境》120页

《分布式太空任务对比分析与综合建模及仿真环境》120页

专知会员服务

2+阅读 · 7月24日

俄乌战争中关于中程打击无人机部署的经验启示

俄乌战争中关于中程打击无人机部署的经验启示

专知会员服务

1+阅读 · 7月24日

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

专知会员服务

5+阅读 · 7月23日

《基于强化学习的自动化红队测试》

《基于强化学习的自动化红队测试》

专知会员服务

4+阅读 · 7月23日

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

专知会员服务

6+阅读 · 7月23日

“天降毒雾”：无人机如何使化学战重返乌克兰战场

“天降毒雾”：无人机如何使化学战重返乌克兰战场

专知会员服务

2+阅读 · 7月23日

伊朗不对称防空战略的演进

伊朗不对称防空战略的演进

专知会员服务

4+阅读 · 7月23日

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

10+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

4+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

8+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

11+阅读 · 7月22日

相关VIP内容

ChatGPT懂常识吗？中科院等最新《ChatGPT是一个有知识但没有经验的求解器:大型语言模型常识问题的研究》论文，

ChatGPT懂常识吗？中科院等最新《ChatGPT是一个有知识但没有经验的求解器:大型语言模型常识问题的研究》论文，

专知会员服务

81+阅读 · 2023年4月5日

最新《文本简化》综述论文，26页pdf，A Survey on Text Simplification

最新《文本简化》综述论文，26页pdf，A Survey on Text Simplification

专知会员服务

15+阅读 · 2020年8月26日

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

74+阅读 · 2020年7月28日

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

专知会员服务

90+阅读 · 2020年7月9日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

乌克兰纵深打击如何重塑俄罗斯的战略选择

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

相关资讯

ChatGPT懂常识吗？中科院等最新《ChatGPT是一个有知识但没有经验的求解器:大型语言模型常识问题的研究》论文解答

ChatGPT懂常识吗？中科院等最新《ChatGPT是一个有知识但没有经验的求解器:大型语言模型常识问题的研究》论文解答

专知

5+阅读 · 2023年4月5日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

11篇ICLR2020满分文章，来看看他们都在做什么？

11篇ICLR2020满分文章，来看看他们都在做什么？

专知

18+阅读 · 2019年11月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

专知

18+阅读 · 2018年2月22日

相关论文

Structured Knowledge Grounding for Question Answering

Arxiv

0+阅读 · 2023年6月5日

Easy-to-Read in Germany: A Survey on its Current State and Available Resources

Arxiv

0+阅读 · 2023年6月5日

Computing Education in the Era of Generative AI

Arxiv

1+阅读 · 2023年6月5日

Building a Credible Case for Safety: Waymo's Approach for the Determination of Absence of Unreasonable Risk

Arxiv

0+阅读 · 2023年6月2日

ChatGPT is a Remarkable Tool -- For Experts

Arxiv

1+阅读 · 2023年6月2日

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

Arxiv

12+阅读 · 2023年4月26日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

Generalizing to Unseen Domains: A Survey on Domain Generalization

Arxiv

30+阅读 · 2021年3月10日

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

Arxiv

10+阅读 · 2020年12月31日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

相关基金

幼龄梅花鹿瘤胃微生物定植、起源及介导的代谢表型的研究

国家自然科学基金

0+阅读 · 2015年12月31日

地理信息检索中语境的获取、推理及应用

国家自然科学基金

6+阅读 · 2012年12月31日

脂肪酸ω-羟化酶催化特异性研究

国家自然科学基金

0+阅读 · 2012年12月31日

坛紫菜高温胁迫应答的定量差异蛋白质组学研究

国家自然科学基金

0+阅读 · 2012年12月31日

可压缩Navier-Stokes方程全局光滑解的适定性问题

国家自然科学基金

0+阅读 · 2012年12月31日

坛紫菜应答高温胁迫分子机制的转录组学分析及相关候选基因克隆

国家自然科学基金

0+阅读 · 2011年12月31日

自适应移动网格方法模拟有限时间爆破解的理论研究和应用

国家自然科学基金

0+阅读 · 2011年12月31日

Navier-Stokes方程解的适定性和粘性消失问题

国家自然科学基金

0+阅读 · 2011年12月31日

Period2基因调控人胶质瘤细胞凋亡的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

低浓度紫杉醇诱导KM12C细胞中血管新生抑制因子凝血酶敏感蛋白-1表达的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员