ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

Tool-using LLM agents increasingly use the Model Context Protocol (MCP) to answer from heterogeneous evidence sources, including search, APIs, databases, clinical records, and formulary tools. Standard factuality metrics usually test whether an answer is supported by pooled evidence, missing a provenance-sensitive failure mode: a claim may be supported somewhere while being attributed to the wrong source. We call this cross-source conflation. We introduce ProvenanceGuard, a source-aware verifier for MCP-grounded answers. It consumes captured MCP traces with stable tool IDs, source IDs, and raw outputs; decomposes answers into atomic claims; routes claims to source-specific evidence; checks support with NLI and a token-alignment proxy; compares stated attribution with the routed source; and returns per-claim verdicts plus an answer-level allow/block decision. Blocked answers can be repaired with retrieval-augmented answer revision and re-verified. We evaluate on 281 medical-domain MCP-agent traces. A 266-trace adjudicated subset yields 2,325 LLM-assisted claim labels split by trace; 361 held-out labels are human-verified. On the 40-trace held-out split, ProvenanceGuard achieves block F1 0.802 and source accuracy 0.858 over 260 source-eligible claims, outperforming source-blind baselines that do not emit claim-to-source IDs. On a harder multi-source benchmark it reaches block F1 0.846, while source-plus-relation accuracy drops to 0.229, showing that exact source ownership remains difficult with semantically close sources. Repair-and-reverify resolves all blocked answers in the full trace set, often via conservative fallback. In 50 controlled clinical conflation probes, ProvenanceGuard detects all injected attribution swaps with no retained wrong attribution. These results show that source attribution is an independent axis for factuality verification in MCP-based agents.

翻译：使用工具的LLM智能体日益依赖模型上下文协议（MCP）从异构证据源（包括搜索、API、数据库、临床记录和处方工具）获取答案。标准事实性指标通常仅检验答案是否被汇总证据支持，但忽略了源感知故障模式：某个主张可能被某处证实，却错误归因于其他来源。我们将此称为跨源混淆。为此，我们提出ProvenanceGuard——一种面向MCP接地答案的源感知验证器。该验证器利用捕获的MCP追踪记录（包含稳定工具ID、源ID及原始输出），将答案分解为原子化主张，将主张路由至对应源证据，通过自然语言推理（NLI）与令牌对齐代理检验支持关系，比较声明的归因与路由来源的一致性，最终返回逐主张判定及答案级允许/阻断决策。被阻断的答案可通过检索增强的答案修订与重新验证进行修复。我们在281条医疗领域MCP智能体追踪记录上评估该验证器。其中经裁决的266条追踪记录生成2325条按追踪分割的LLM辅助标注主张，另保留361条人工验证标签。在40条追踪记录的保留测试集上，ProvenanceGuard对260条符合源条件的主张取得0.802的阻断F1值与0.858的源准确率，优于不生成主张-源ID映射的源不可知基线。在更具挑战性的多源基准测试中，其阻断F1值达0.846，但源+关系准确率下降至0.229，表明在语义相近的源条件下精确判定源归属仍具难度。经过修复-重新验证流程，全部追踪记录集中的被阻断答案均得以解决，通常通过保守回退策略实现。在50项受控临床混淆探测实验中，ProvenanceGuard检测出全部注入的归因交换且未保留任何错误归因。上述结果表明，源归因是基于MCP的智能体事实性验证中的独立评估维度。