Hallucinations in large language models (LLMs) refer to the phenomenon of LLMs producing responses that are coherent yet factually inaccurate. This issue undermines the effectiveness of LLMs in practical applications, necessitating research into detecting and mitigating hallucinations of LLMs. Previous studies have mainly concentrated on post-processing techniques for hallucination detection, which tend to be computationally intensive and limited in effectiveness due to their separation from the LLM's inference process. To overcome these limitations, we introduce MIND, an unsupervised training framework that leverages the internal states of LLMs for real-time hallucination detection without requiring manual annotations. Additionally, we present HELM, a new benchmark for evaluating hallucination detection across multiple LLMs, featuring diverse LLM outputs and the internal states of LLMs during their inference process. Our experiments demonstrate that MIND outperforms existing state-of-the-art methods in hallucination detection.
翻译:大语言模型中的幻觉现象指的是模型生成连贯但事实不准确的回答。这一问题削弱了大语言模型在实际应用中的有效性,亟需研究检测与缓解大语言模型幻觉的方法。以往研究主要集中于基于后处理技术的幻觉检测方法,这些方法计算成本高,且因与模型推理过程分离而效果有限。为克服这些局限,我们提出MIND无监督训练框架,通过利用大语言模型的内部状态实现实时幻觉检测,无需人工标注。此外,我们构建HELM基准测试集,该基准涵盖多个大语言模型的多样化输出及其推理过程中的内部状态,用于评估幻觉检测性能。实验表明,MIND在幻觉检测任务上优于现有最先进方法。