Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of tasks in different domains. However, they sometimes generate responses that are logically coherent but factually incorrect or misleading, which is known as LLM hallucinations. Data-driven supervised methods train hallucination detectors by leveraging the internal states of LLMs, but detectors trained on specific domains often struggle to generalize well to other domains. In this paper, we aim to enhance the cross-domain performance of supervised detectors with only in-domain data. We propose a novel framework, prompt-guided internal states for hallucination detection of LLMs, namely PRISM. By utilizing appropriate prompts to guide changes in the structure related to text truthfulness within the LLM's internal states, we make this structure more salient and consistent across texts from different domains. We integrated our framework with existing hallucination detection methods and conducted experiments on datasets from different domains. The experimental results indicate that our framework significantly enhances the cross-domain generalization of existing hallucination detection methods.
翻译:大语言模型(LLMs)在不同领域的多种任务中展现出卓越的能力。然而,它们有时会生成逻辑连贯但在事实上不正确或具有误导性的回答,这种现象被称为LLM幻觉。数据驱动的监督方法通过利用LLMs的内部状态来训练幻觉检测器,但在特定领域训练的检测器往往难以很好地泛化到其他领域。在本文中,我们旨在仅使用领域内数据来提升监督检测器的跨领域性能。我们提出了一种新颖的框架,即基于提示引导的内部状态用于大语言模型幻觉检测,简称PRISM。通过利用适当的提示来引导LLM内部状态中与文本真实性相关的结构发生变化,我们使得该结构更加显著,并在来自不同领域的文本中保持一致性。我们将该框架与现有的幻觉检测方法相结合,并在不同领域的数据集上进行了实验。实验结果表明,我们的框架显著增强了现有幻觉检测方法的跨领域泛化能力。