Automated fact-checking (AFC) systems are susceptible to adversarial attacks, enabling false claims to evade detection. Existing adversarial frameworks typically rely on injecting noise or altering semantics, yet no existing framework exploits the adversarial potential of persuasion techniques, which are widely used in disinformation campaigns to manipulate audiences. In this paper, we introduce a novel class of persuasive adversarial attacks on AFCs by employing a generative LLM to rephrase claims using persuasion techniques. Considering 15 techniques grouped into 6 categories, we study the effects of persuasion on both claim verification and evidence retrieval using a decoupled evaluation strategy. Experiments on the FEVER and FEVEROUS benchmarks show that persuasion attacks can substantially degrade both verification performance and evidence retrieval. Our analysis identifies persuasion techniques as a potent class of adversarial attacks, highlighting the need for more robust AFC systems.
翻译:自动化事实核查系统易受对抗性攻击,导致虚假声明得以规避检测。现有对抗性框架通常依赖于注入噪声或改变语义,但尚未有框架利用说服技术的对抗潜力——该技术被广泛用于虚假信息活动中以操纵受众。本文通过使用生成式大语言模型,借助说服技术对声明进行重述,从而提出一类针对自动化事实核查系统的新型说服性对抗攻击。我们考虑了分为6个类别的15种说服技术,并采用解耦评估策略研究了说服对声明验证和证据检索的影响。在FEVER和FEVEROUS基准测试上的实验表明,说服攻击能显著降低验证性能和证据检索效果。我们的分析将说服技术确立为一类有效的对抗性攻击,凸显了构建更鲁棒的自动化事实核查系统的必要性。