Hallucinations, the generation of apparently convincing yet false statements, remain a major barrier to the safe deployment of LLMs. Building on the strong performance of self-detection methods, we examine the use of structured knowledge representations, namely knowledge graphs, to improve hallucination self-detection. Specifically, we propose a simple yet powerful approach that enriches hallucination self-detection by (i) converting LLM responses into knowledge graphs of entities and relations, and (ii) using these graphs to estimate the likelihood that a response contains hallucinations. We evaluate the proposed approach using two widely used LLMs, GPT-4o and Gemini-2.5-Flash, across two hallucination detection datasets. To support more reliable future benchmarking, one of these datasets has been manually curated and enhanced and is released as a secondary outcome of this work. Compared to standard self-detection methods and SelfCheckGPT, a state-of-the-art approach, our method achieves up to 16% relative improvement in accuracy and 20% in F1-score. Our results show that LLMs can better analyse atomic facts when they are structured as knowledge graphs, even when initial outputs contain inaccuracies. This low-cost, model-agnostic approach paves the way toward safer and more trustworthy language models.
翻译:幻觉(即生成看似可信实则错误的陈述)仍是阻碍大语言模型安全部署的主要障碍。基于自检测方法的优异表现,本研究探索利用结构化知识表征——即知识图谱——来提升幻觉自检测性能。具体而言,我们提出一种简洁而高效的方法,通过以下两个步骤增强幻觉自检测能力:(i)将大语言模型生成的回答转化为实体关系知识图谱;(ii)利用这些图谱评估回答包含幻觉的可能性。我们使用GPT-4o和Gemini-2.5-Flash两种主流大语言模型,在两个幻觉检测数据集上评估了所提方法。为支持未来更可靠的基准测试,其中一个数据集经过人工校勘与增强,将作为本研究的副产物公开发布。相较于标准自检测方法及当前最先进的SelfCheckGPT方法,我们的方法在准确率上最高获得16%的相对提升,F1分数提升达20%。实验结果表明,即使初始输出存在错误,大语言模型也能更有效地分析以知识图谱结构化的原子事实。这种低成本、模型无关的方法为构建更安全可信的语言模型开辟了新路径。