The factual reliability of Large Language Models (LLMs) remains a critical barrier to their adoption in high-stakes domains due to their propensity to hallucinate. Current detection methods often rely on surface-level signals from the model's output, overlooking the failures that occur within the model's internal reasoning process. In this paper, we introduce a new paradigm for hallucination detection by analyzing the dynamic topology of the evolution of model's layer-wise attention. We model the sequence of attention matrices as a zigzag graph filtration and use zigzag persistence, a tool from Topological Data Analysis, to extract a topological signature. Our core hypothesis is that factual and hallucinated generations exhibit distinct topological signatures. We validate our framework, HalluZig, on multiple benchmarks, demonstrating that it outperforms strong baselines. Furthermore, our analysis reveals that these topological signatures are generalizable across different models and hallucination detection is possible only using structural signatures from partial network depth.
翻译:大型语言模型的事实可靠性因其倾向于产生幻觉,仍然是其在高风险领域应用的关键障碍。当前的检测方法通常依赖于模型输出的表层信号,忽视了模型内部推理过程中发生的故障。本文通过分析模型逐层注意力演化的动态拓扑结构,提出了一种新的幻觉检测范式。我们将注意力矩阵序列建模为一种zigzag图过滤,并利用拓扑数据分析中的工具——zigzag持续性——来提取拓扑特征。我们的核心假设是:事实性生成与幻觉性生成会展现出不同的拓扑特征。我们在多个基准测试上验证了我们的框架HalluZig,证明其性能优于强基线模型。此外,我们的分析表明,这些拓扑特征在不同模型间具有可泛化性,并且仅利用部分网络深度的结构特征即可实现幻觉检测。