To efficiently combat the spread of LLM-generated misinformation, we present RADAR, a retrieval-augmented detector with adversarial refinement for robust fake news detection. Our approach employs a generator that rewrites real articles with factual perturbations, paired with a lightweight detector that verifies claims using dense passage retrieval. To enable effective co-evolution, we introduce verbal adversarial feedback (VAF). Rather than relying on scalar rewards, VAF issues structured natural-language critiques; these guide the generator toward more sophisticated evasion attempts, compelling the detector to adapt and improve. On a fake news detection benchmark, RADAR achieves 86.98% ROC-AUC, significantly outperforming general-purpose LLMs with retrieval. Ablation studies confirm that detector-side retrieval yields the largest gains, while VAF and few-shot demonstrations provide critical signals for robust training.
翻译:为有效应对大语言模型生成虚假信息的传播,我们提出RADAR——一种基于检索增强与对抗性优化的鲁棒虚假新闻检测框架。该方法采用生成器对真实新闻进行事实性扰动重写,并配合轻量级检测器通过密集段落检索验证声明真实性。为实现有效协同进化,我们提出语言化对抗反馈机制。该机制摒弃传统标量奖励,转而生成结构化自然语言批评;这些批评指导生成器进行更复杂的规避尝试,从而迫使检测器持续适应与改进。在虚假新闻检测基准测试中,RADAR取得86.98%的ROC-AUC指标,显著优于通用检索增强型大语言模型。消融实验证实:检测器端检索贡献最大性能增益,而语言化对抗反馈与少样本示例为鲁棒训练提供了关键信号。