Recent approaches combining Large Language Models (LLMs) with retrieval-augmented reasoning have shown promise for automated fact verification. To process complex claims, these verification pipelines typically execute multi-stage workflows that coordinate tightly coupled modules, including claim decomposition, evidence gathering, and verdict prediction. However, existing methods optimize individual stages in isolation or rely on fixed heuristics, which limits adaptive coordination among stages and can lead to suboptimal outcomes. In this work, we propose ProFact, an agentic reinforcement learning framework for end-to-end optimization of multi-stage fact verification trajectories. ProFact trains a unified policy to coordinate claim decomposition, evidence seeking, answer generation, and verdict prediction. To address the sparse and delayed supervision provided by final veracity labels, ProFact introduces process-aware rewards that provide stage-level learning signals throughout the verification process. Empirical evaluation shows that ProFact consistently outperforms strong baselines in both verification performance and inference efficiency. These results highlight the effectiveness of process-aware trajectory optimization for multi-stage fact verification.
翻译:近期结合大语言模型(LLMs)与检索增强推理的方法在自动事实核查领域展现出良好前景。为处理复杂主张,这些核查流水线通常执行多阶段流程,协调紧密耦合的模块,包括主张分解、证据收集与裁决预测。然而,现有方法对各阶段进行孤立优化或依赖固定启发式规则,限制了阶段间的自适应协调能力,可能导致次优结果。本文提出ProFact框架——一种面向多阶段事实核查轨迹端到端优化的智能体强化学习方法。ProFact训练统一策略以协调主张分解、证据检索、答案生成与裁决预测。针对最终真实性标签提供的稀疏延迟监督问题,ProFact引入过程感知奖励,在整个核查过程中提供阶段级学习信号。实验评估表明,ProFact在核查性能与推理效率上均持续超越强基线模型。这些结果凸显了过程感知轨迹优化对多阶段事实核查的有效性。