Despite their impressive generative capabilities, LLMs are hindered by fact-conflicting hallucinations in real-world applications. The accurate identification of hallucinations in texts generated by LLMs, especially in complex inferential scenarios, is a relatively unexplored area. To address this gap, we present FactCHD, a dedicated benchmark designed for the detection of fact-conflicting hallucinations from LLMs. FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation. A distinctive element of FactCHD is its integration of fact-based evidence chains, significantly enhancing the depth of evaluating the detectors' explanations. Experiments on different LLMs expose the shortcomings of current approaches in detecting factual errors accurately. Furthermore, we introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2, aiming to yield more credible detection through the amalgamation of predictive results and evidence. The benchmark dataset is available at https://github.com/zjunlp/FactCHD.
翻译:尽管大语言模型(LLMs)具备强大的生成能力,但在实际应用中仍受限于事实冲突型幻觉。针对LLMs生成文本中幻觉的精准识别,特别是在复杂推理场景下的识别,仍是一个相对未被充分探索的研究领域。为填补这一空白,我们提出FactCHD——一个专用于检测LLMs事实冲突型幻觉的基准测试平台。该基准包含覆盖多种事实性模式(包括简单事实、多跳推理、比较运算及集合操作)的多样化数据集。其独特之处在于整合了基于事实的证据链,显著提升了检测器解释能力的评估深度。基于不同LLMs的实验揭示了现有方法在准确识别事实错误方面的局限性。此外,我们提出Truth-Triangulator方法,通过工具增强型ChatGPT与基于Llama2的LoRA微调协同生成反思性推理,旨在融合预测结果与证据链以提升检测可信度。该基准数据集已在https://github.com/zjunlp/FactCHD 开源。