To efficiently combat the spread of LLM-generated misinformation, we present RADAR, a Retrieval-Augmented Detector with Adversarial Refinement for robust fake news detection. Our approach employs a generator that rewrites real articles with factual perturbations, paired with a lightweight detector that verifies claims using dense passage retrieval. To enable effective co-evolution, we introduce verbal adversarial feedback (VAF). Rather than relying on scalar rewards, VAF issues structured natural-language critiques; these guide the generator toward more sophisticated evasion attempts, compelling the detector to adapt and improve. On a fake news detection benchmark, RADAR consistently outperforms strong retrieval-augmented trainable baselines, as well as general-purpose LLMs with retrieval. Further analysis shows that detector-side retrieval yields the largest gains, while VAF and few-shot demonstrations provide complementary benefits. RADAR also transfers better to fake news generated by an unseen external attacker, indicating improved robustness beyond the co-evolved training setting.
翻译:为有效应对大语言模型生成的虚假信息传播,我们提出RADAR——一种结合检索增强与对抗优化的鲁棒性假新闻检测方法。该方法采用生成器通过事实扰动改写真实文章,并搭配轻量级检测器通过密集段落检索验证声明。为实现有效的协同演化,我们引入语言对抗反馈机制(VAF)。不同于依赖标量奖励,VAF生成结构化自然语言批评:这些批评引导生成器实施更复杂的规避策略,促使检测器不断适应与改进。在假新闻检测基准测试中,RADAR持续优于强检索增强可训练基线模型以及具备检索能力的通用大语言模型。进一步分析表明,检测器侧检索贡献最大优化效果,而VAF与小样本示例提供互补性增益。RADAR对未见外部攻击者生成的假新闻具有更优迁移能力,表明其在协同演化训练场景外仍具备增强的鲁棒性。