Hallucinations in large language models (LLMs) refer to the phenomenon of LLMs producing responses that are coherent yet factually inaccurate. This issue undermines the effectiveness of LLMs in practical applications, necessitating research into detecting and mitigating hallucinations of LLMs. Previous studies have mainly concentrated on post-processing techniques for hallucination detection, which tend to be computationally intensive and limited in effectiveness due to their separation from the LLM's inference process. To overcome these limitations, we introduce MIND, an unsupervised training framework that leverages the internal states of LLMs for real-time hallucination detection without requiring manual annotations. Additionally, we present HELM, a new benchmark for evaluating hallucination detection across multiple LLMs, featuring diverse LLM outputs and the internal states of LLMs during their inference process. Our experiments demonstrate that MIND outperforms existing state-of-the-art methods in hallucination detection.
翻译:大语言模型中的幻觉现象是指模型产生连贯但事实不准确的响应。这一问题削弱了大语言模型在实际应用中的有效性,因此需要研究如何检测和缓解大语言模型的幻觉。以往研究主要集中于后处理技术进行幻觉检测,这些技术计算开销大且因脱离大语言模型的推理过程而效果有限。为克服这些局限,我们提出MIND,一种利用大语言模型内部状态进行实时幻觉检测的无监督训练框架,无需人工标注。此外,我们推出HELM,一个用于评估多个大语言模型幻觉检测性能的新基准,包含多样化的模型输出及其推理过程中的内部状态。实验表明,MIND在幻觉检测方面优于现有最先进方法。