Large language models (large LMs) are susceptible to producing text with hallucinated content. Self-contradiction, where the LM generates two contradictory sentences within the same context, is an important form of hallucination. In this work, we present a comprehensive analysis on self-contradiction for state-of-the-art, instruction-tuned LMs, including evaluation, detection, and mitigation. To effectively trigger self-contradictions, we design a framework that constrains LMs to generate appropriate sentence pairs. Our evaluation on these sentence pairs reveals that self-contradictions occur frequently across different LMs for both famous and lesser-known topics. Next, we prompt the LMs to detect self-contradictions. Our results indicate that ChatGPT and GPT-4 are able to accurately identify self-contradictions, while Vicuna-13B struggles to do so. For example, with our best prompting method, ChatGPT achieves 91.0% precision and 80.5% recall on the sentence pairs generated by itself. To automatically mitigate self-contradictions, we develop an iterative algorithm that prompts the LMs to remove the detected self-contradictions from the generated text. Our algorithm successfully revises the text such that self-contradictions are significantly reduced, while maintaining its fluency and informativeness. Importantly, our entire pipeline of triggering, detecting, and mitigating self-contradictions is applicable to black-box LMs and does not require any external grounded knowledge.
翻译:大语言模型(大型语言模型,LLMs)易于生成包含幻觉内容的文本。其中,模型在同一上下文中生成两个相互矛盾的句子,即自我矛盾,是一种重要的幻觉形式。本文对当前最先进、经过指令微调的大型语言模型的自我矛盾问题进行了全面分析,涵盖评估、检测与缓解。为有效触发自我矛盾,我们设计了一个框架,约束模型生成适当的句子对。通过对这些句子对的评估,我们发现不同模型在知名及小众话题上均频繁出现自我矛盾。接着,我们引导模型检测自我矛盾。结果表明,ChatGPT和GPT-4能准确识别自我矛盾,而Vicuna-13B则难以胜任。例如,使用我们最佳的提示方法,ChatGPT在由其自身生成的句子对上达到了91.0%的精确率和80.5%的召回率。为自动缓解自我矛盾,我们开发了一种迭代算法,引导模型从生成文本中移除已检测到的自我矛盾。该算法成功修订了文本,显著减少了自我矛盾,同时保持了文本的流畅性和信息量。重要的是,我们整个触发、检测和缓解自我矛盾的流程适用于黑盒模型,且不依赖任何外部基础知识。