Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation - 专知论文

会员服务 ·

0

Prompt · MoDELS · 可辨认的 · state-of-the-art · 查全率/召回率 ·

2023 年 5 月 25 日

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

翻译：大语言模型的自我矛盾幻觉：评估、检测与缓解

Niels Mündler,Jingxuan He,Slobodan Jenko,Martin Vechev

Large language models (large LMs) are susceptible to producing text with hallucinated content. Self-contradiction, where the LM generates two contradictory sentences within the same context, is an important form of hallucination. In this work, we present a comprehensive analysis on self-contradiction for state-of-the-art, instruction-tuned LMs, including evaluation, detection, and mitigation. To effectively trigger self-contradictions, we design a framework that constrains LMs to generate appropriate sentence pairs. Our evaluation on these sentence pairs reveals that self-contradictions occur frequently across different LMs for both famous and lesser-known topics. Next, we prompt the LMs to detect self-contradictions. Our results indicate that ChatGPT and GPT-4 are able to accurately identify self-contradictions, while Vicuna-13B struggles to do so. For example, with our best prompting method, ChatGPT achieves 91.0% precision and 80.5% recall on the sentence pairs generated by itself. To automatically mitigate self-contradictions, we develop an iterative algorithm that prompts the LMs to remove the detected self-contradictions from the generated text. Our algorithm successfully revises the text such that self-contradictions are significantly reduced, while maintaining its fluency and informativeness. Importantly, our entire pipeline of triggering, detecting, and mitigating self-contradictions is applicable to black-box LMs and does not require any external grounded knowledge.

翻译：大语言模型（大型语言模型，LLMs）易于生成包含幻觉内容的文本。其中，模型在同一上下文中生成两个相互矛盾的句子，即自我矛盾，是一种重要的幻觉形式。本文对当前最先进、经过指令微调的大型语言模型的自我矛盾问题进行了全面分析，涵盖评估、检测与缓解。为有效触发自我矛盾，我们设计了一个框架，约束模型生成适当的句子对。通过对这些句子对的评估，我们发现不同模型在知名及小众话题上均频繁出现自我矛盾。接着，我们引导模型检测自我矛盾。结果表明，ChatGPT和GPT-4能准确识别自我矛盾，而Vicuna-13B则难以胜任。例如，使用我们最佳的提示方法，ChatGPT在由其自身生成的句子对上达到了91.0%的精确率和80.5%的召回率。为自动缓解自我矛盾，我们开发了一种迭代算法，引导模型从生成文本中移除已检测到的自我矛盾。该算法成功修订了文本，显著减少了自我矛盾，同时保持了文本的流畅性和信息量。重要的是，我们整个触发、检测和缓解自我矛盾的流程适用于黑盒模型，且不依赖任何外部基础知识。

0

相关内容

Prompt

《AI中毒攻击》34页slides

《AI中毒攻击》34页slides

专知会员服务

26+阅读 · 2022年10月17日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

博尔纳病毒改变大鼠海马H4K5乙酰化所致学习记忆障碍的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

线粒体自噬-Warburg效应介导apelin促血管平滑肌细胞增殖

国家自然科学基金

0+阅读 · 2014年12月31日

衰老小鼠线粒体促凋亡蛋白Omi/HtrA2表达增加在加重帕金森病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

YB-1介导血管内皮细胞凋亡的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

生理和缺血再灌注状态下的冠脉内皮功能 - - 内皮离子通道间信号关联的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Renin-Angiotensin System在介导机械通气所致肺微血管内皮细胞功能障碍中的作用及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

心脏植入电子装置早期感染的诊断研究

国家自然科学基金

0+阅读 · 2011年12月31日

血管内皮细胞PI3K/Akt通路调控海马神经血管单元微环境与缺氧缺血脑损伤后认知功能障碍

国家自然科学基金

0+阅读 · 2009年12月31日

Toll 样受体介导的巨噬细胞对prion清除的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

Introducing Foundation Models as Surrogate Models: Advancing Towards More Practical Adversarial Attacks

Arxiv

0+阅读 · 2023年7月13日

A Survey on Evaluation of Large Language Models

Arxiv

7+阅读 · 2023年7月12日

Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey

Arxiv

0+阅读 · 2023年7月11日

Managing Data Replication and Distribution in the Fog with FReD

Arxiv

0+阅读 · 2023年7月11日

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

Arxiv

0+阅读 · 2023年7月11日

Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures

Arxiv

0+阅读 · 2023年7月10日

A Survey on Large Language Models for Recommendation

Arxiv

12+阅读 · 2023年5月31日

Beyond One-Model-Fits-All: A Survey of Domain Specialization for Large Language Models

Arxiv

66+阅读 · 2023年5月31日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

VIP会员

文章信息

相关主题

state-of-the-art

查全率/召回率

最新内容

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

2+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

4+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

5+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

6+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

9+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

6+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

9+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

7+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

13+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

8+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

6+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

8+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

8+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

10+阅读 · 6月17日

相关VIP内容

《AI中毒攻击》34页slides

《AI中毒攻击》34页slides

专知会员服务

26+阅读 · 2022年10月17日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

相关论文

Introducing Foundation Models as Surrogate Models: Advancing Towards More Practical Adversarial Attacks

Arxiv

0+阅读 · 2023年7月13日

A Survey on Evaluation of Large Language Models

Arxiv

7+阅读 · 2023年7月12日

Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey

Arxiv

0+阅读 · 2023年7月11日

Managing Data Replication and Distribution in the Fog with FReD

Arxiv

0+阅读 · 2023年7月11日

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

Arxiv

0+阅读 · 2023年7月11日

Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures

Arxiv

0+阅读 · 2023年7月10日

A Survey on Large Language Models for Recommendation

Arxiv

12+阅读 · 2023年5月31日

Beyond One-Model-Fits-All: A Survey of Domain Specialization for Large Language Models

Arxiv

66+阅读 · 2023年5月31日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

相关基金

博尔纳病毒改变大鼠海马H4K5乙酰化所致学习记忆障碍的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

线粒体自噬-Warburg效应介导apelin促血管平滑肌细胞增殖

国家自然科学基金

0+阅读 · 2014年12月31日

衰老小鼠线粒体促凋亡蛋白Omi/HtrA2表达增加在加重帕金森病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

YB-1介导血管内皮细胞凋亡的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

生理和缺血再灌注状态下的冠脉内皮功能 - - 内皮离子通道间信号关联的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Renin-Angiotensin System在介导机械通气所致肺微血管内皮细胞功能障碍中的作用及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

心脏植入电子装置早期感染的诊断研究

国家自然科学基金

0+阅读 · 2011年12月31日

血管内皮细胞PI3K/Akt通路调控海马神经血管单元微环境与缺氧缺血脑损伤后认知功能障碍

国家自然科学基金

0+阅读 · 2009年12月31日

Toll 样受体介导的巨噬细胞对prion清除的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员