Retrieval-Augmented Generation (RAG) models are critically undermined by citation hallucinations, a deceptive failure where a model cites a source that fails to support its claim. While existing work attributes hallucination to a simple over-reliance on parametric knowledge, we reframe this failure as an evolving, scale-dependent coordination failure between the Attention (reading) and Feed-Forward Network (recalling) pathways. We introduce FACTUM (Framework for Attesting Citation Trustworthiness via Underlying Mechanisms), a framework of four mechanistic scores: Contextual Alignment (CAS), Attention Sink Usage (BAS), Parametric Force (PFS), and Pathway Alignment (PAS). Our analysis reveals that correct citations are consistently marked by higher parametric force (PFS) and greater use of the attention sink (BAS) for information synthesis. Crucially, we find that "one-size-fits-all" theories are insufficient as the signature of correctness evolves with scale: while the 3B model relies on high pathway alignment (PAS), our best-performing 8B detector identifies a shift toward a specialized strategy where pathways provide distinct, orthogonal information. By capturing this complex interplay, FACTUM outperforms state-of-the-art baselines by up to 37.5% in AUC. Our results demonstrate that high parametric force is constructive when successfully coordinated with the Attention pathway, paving the way for more nuanced and reliable RAG systems.
翻译:检索增强生成(RAG)模型受到引文幻觉的严重损害,这是一种欺骗性故障,即模型引用了无法支持其主张的来源。现有研究将幻觉归因于对参数知识的简单过度依赖,而我们则将此故障重新定义为注意力(阅读)与前馈网络(回忆)路径之间一种不断演变、依赖于规模的协调失败。我们提出了FACTUM(通过底层机制验证引文可信度的框架),这是一个包含四个机制性评分的框架:上下文对齐评分(CAS)、注意力汇聚点使用评分(BAS)、参数驱动力评分(PFS)和路径对齐评分(PAS)。我们的分析表明,正确的引文始终伴随着较高的参数驱动力(PFS)以及更多地利用注意力汇聚点(BAS)进行信息合成。关键的是,我们发现“一刀切”的理论是不充分的,因为正确性的特征会随着模型规模演变:虽然30亿参数模型依赖于高路径对齐(PAS),但我们性能最佳的80亿参数检测器识别出一种向专门化策略的转变,即各路径提供独特且正交的信息。通过捕捉这种复杂的相互作用,FACTUM在AUC指标上优于现有最先进的基线方法高达37.5%。我们的结果表明,当高参数驱动力与注意力路径成功协调时,它具有建设性作用,这为开发更精细、更可靠的RAG系统铺平了道路。