Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes, which struggle to scale due to prohibitive labor costs and insufficient reliability of existing hallucination annotators. To facilitate the scalable oversight of LLM hallucinations, this paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset and improves the accuracy of the hallucination annotator. Based on the Expectation Maximization (EM) algorithm, in each iteration, the framework first applies a hallucination annotation pipeline to annotate a scaled dataset and then trains a more accurate hallucination annotator on the dataset. This new hallucination annotator is adopted in the hallucination annotation pipeline used for the next iteration. Extensive experimental results demonstrate that the finally obtained hallucination annotator with only 7B parameters surpasses the performance of GPT-4 and obtains new state-of-the-art hallucination detection results on HaluEval and HalluQA by zero-shot inference. Such an annotator can not only evaluate the hallucination levels of various LLMs on the large-scale dataset but also help to mitigate the hallucination of LLMs generations, with the Natural Language Inference (NLI) metric increasing from 25% to 37% on HaluEval.
翻译:大语言模型(LLMs)在跨领域、广应用的长文本问答任务中表现出幻觉现象。当前的幻觉检测与缓解数据集在领域覆盖和规模上均存在局限,由于标注人力成本过高且现有幻觉标注器的可靠性不足,这些数据集难以扩展。为促进LLM幻觉的可扩展监督,本文提出一种迭代自训练框架,该框架能同时且逐步扩展幻觉标注数据集并提升幻觉标注器的准确性。基于期望最大化(EM)算法,在每一轮迭代中,框架首先应用幻觉标注流程对扩展后的数据集进行标注,随后在该数据集上训练一个更准确的幻觉标注器。这一新的幻觉标注器将被用于下一轮迭代的幻觉标注流程中。大量实验结果表明,最终获得的仅含70亿参数的幻觉标注器超越了GPT-4的性能,并通过零样本推理在HaluEval和HalluQA数据集上取得了新的最先进幻觉检测结果。该标注器不仅能够在大规模数据集上评估各类LLMs的幻觉水平,还有助于缓解LLMs生成内容中的幻觉问题,在HaluEval数据集上其自然语言推理(NLI)指标从25%提升至37%。