State-of-the-art single-agent claim verification methods struggle with complex claims that require nuanced analysis of multifaceted evidence. Inspired by real-world professional fact-checkers, we propose \textbf{DebateCV}, the first debate-driven claim verification framework powered by multiple LLM agents. In DebateCV, two \textit{Debaters} argue opposing stances to surface subtle errors in single-agent assessments. A decisive \textit{Moderator} is then required to weigh the evidential strength of conflicting arguments to deliver an accurate verdict. Yet, zero-shot Moderators are biased toward neutral judgments, and no datasets exist for training them. To bridge this gap, we propose \textbf{Debate-SFT}, a post-training framework that leverages synthetic data to enhance agents' ability to effectively adjudicate debates for claim verification. Results show that our methods surpass state-of-the-art non-debate approaches in both accuracy (across various evidence conditions) and justification quality.
翻译:当前最先进的单智能体主张验证方法在处理需要多维度证据细致分析的复杂主张时面临困难。受现实世界中专业事实核查员的启发,我们提出了首个由多LLM智能体驱动的辩论驱动式主张验证框架——\textbf{DebateCV}。在该框架中,两位\textit{辩手}通过持对立立场展开辩论,以揭示单智能体评估中的细微错误。随后,一位具有决定权的\textit{主持人}需权衡对立论点的证据强度,以作出准确裁决。然而,零样本主持人倾向于作出中立判断,且目前缺乏用于训练此类主持人的数据集。为弥补这一差距,我们提出了\textbf{Debate-SFT}——一种利用合成数据增强智能体在主张验证中有效裁决辩论能力的后训练框架。实验结果表明,我们的方法在准确性(涵盖多种证据条件)和论证质量上均超越了当前最先进的非辩论方法。