FactCHD: Benchmarking Fact-Conflicting Hallucination Detection

Despite their impressive generative capabilities, LLMs are hindered by fact-conflicting hallucinations in real-world applications. The accurate identification of hallucinations in texts generated by LLMs, especially in complex inferential scenarios, is a relatively unexplored area. To address this gap, we present FactCHD, a dedicated benchmark designed for the detection of fact-conflicting hallucinations from LLMs. FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation. A distinctive element of FactCHD is its integration of fact-based evidence chains, significantly enhancing the depth of evaluating the detectors' explanations. Experiments on different LLMs expose the shortcomings of current approaches in detecting factual errors accurately. Furthermore, we introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2, aiming to yield more credible detection through the amalgamation of predictive results and evidence. The benchmark dataset is available at https://github.com/zjunlp/FactCHD.

翻译：尽管大语言模型（LLMs）具备强大的生成能力，但在实际应用中仍受限于事实冲突型幻觉。针对LLMs生成文本中幻觉的精准识别，特别是在复杂推理场景下的识别，仍是一个相对未被充分探索的研究领域。为填补这一空白，我们提出FactCHD——一个专用于检测LLMs事实冲突型幻觉的基准测试平台。该基准包含覆盖多种事实性模式（包括简单事实、多跳推理、比较运算及集合操作）的多样化数据集。其独特之处在于整合了基于事实的证据链，显著提升了检测器解释能力的评估深度。基于不同LLMs的实验揭示了现有方法在准确识别事实错误方面的局限性。此外，我们提出Truth-Triangulator方法，通过工具增强型ChatGPT与基于Llama2的LoRA微调协同生成反思性推理，旨在融合预测结果与证据链以提升检测可信度。该基准数据集已在https://github.com/zjunlp/FactCHD 开源。

相关内容

AIM

关注 660

医学人工智能AIM（Artificial Intelligence in Medicine）杂志发表了多学科领域的原创文章，涉及医学中的人工智能理论和实践，以医学为导向的人类生物学和卫生保健。医学中的人工智能可以被描述为与研究、项目和应用相关的科学学科，旨在通过基于知识或数据密集型的计算机解决方案支持基于决策的医疗任务，最终支持和改善人类护理提供者的性能。官网地址：http://dblp.uni-trier.de/db/journals/artmed/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日