The increasing reliance on natural language generation (NLG) models, particularly large language models, has raised concerns about the reliability and accuracy of their outputs. A key challenge is hallucination, where models produce plausible but incorrect information. As a result, hallucination detection has become a critical task. In this work, we introduce a comprehensive hallucination taxonomy with 11 categories across various NLG tasks and propose the HAllucination Detection (HAD) models https://github.com/pku0xff/HAD, which integrate hallucination detection, span-level identification, and correction into a single inference process. Trained on an elaborate synthetic dataset of about 90K samples, our HAD models are versatile and can be applied to various NLG tasks. We also carefully annotate a test set for hallucination detection, called HADTest, which contains 2,248 samples. Evaluations on in-domain and out-of-domain test sets show that our HAD models generally outperform the existing baselines, achieving state-of-the-art results on HaluEval, FactCHD, and FaithBench, confirming their robustness and versatility.
翻译:随着对自然语言生成(NLG)模型(尤其是大语言模型)的依赖日益加深,其输出结果的可靠性与准确性引发了广泛担忧。一个核心挑战是幻觉问题,即模型生成看似合理但实则错误的信息。因此,幻觉检测已成为一项关键任务。本研究提出一个涵盖各类NLG任务、包含11个类别的全面幻觉分类体系,并介绍了幻觉检测(HAD)模型(https://github.com/pku0xff/HAD)。该模型将幻觉检测、片段级定位与修正整合至单一推理流程中。通过在精心构建的约9万样本合成数据集上进行训练,我们的HAD模型具备通用性,可适用于多种NLG任务。我们还细致标注了一个包含2,248个样本的幻觉检测测试集HADTest。在领域内与跨领域测试集上的评估表明,HAD模型在多数情况下优于现有基线,在HaluEval、FactCHD和FaithBench基准上取得了最先进的性能,验证了其鲁棒性与泛化能力。