Mis- and disinformation are a substantial global threat to our security and safety. To cope with the scale of online misinformation, researchers have been working on automating fact-checking by retrieving and verifying against relevant evidence. However, despite many advances, a comprehensive evaluation of the possible attack vectors against such systems is still lacking. Particularly, the automated fact-verification process might be vulnerable to the exact disinformation campaigns it is trying to combat. In this work, we assume an adversary that automatically tampers with the online evidence in order to disrupt the fact-checking model via camouflaging the relevant evidence or planting a misleading one. We first propose an exploratory taxonomy that spans these two targets and the different threat model dimensions. Guided by this, we design and propose several potential attack methods. We show that it is possible to subtly modify claim-salient snippets in the evidence and generate diverse and claim-aligned evidence. Thus, we highly degrade the fact-checking performance under many different permutations of the taxonomy's dimensions. The attacks are also robust against post-hoc modifications of the claim. Our analysis further hints at potential limitations in models' inference when faced with contradicting evidence. We emphasize that these attacks can have harmful implications on the inspectable and human-in-the-loop usage scenarios of such models, and we conclude by discussing challenges and directions for future defenses.
翻译:虚假信息与误导信息对我们社会的安全与保障构成了巨大的全球性威胁。为应对网络虚假信息的规模问题,研究人员一直致力于通过检索并比对相关证据来自动化事实核查。然而,尽管取得了诸多进展,目前仍缺乏对此类系统潜在攻击向量的全面评估。特别是,自动事实核查过程可能恰恰容易受到其试图打击的虚假信息活动的攻击。在本工作中,我们假设存在一个对手,它通过伪装相关证据或植入误导性证据来自动篡改在线证据,从而破坏事实核查模型。我们首先提出一个探索性分类法,涵盖这两个目标以及不同的威胁模型维度。在此基础上,我们设计并提出了若干潜在的攻击方法。我们证明,可以巧妙地修改证据中与主张相关的显著片段,生成多样化且与主张一致的证据。因此,在分类法维度的多种不同组合下,我们大幅降低了事实核查的性能。这些攻击对事后修改的主张也具有鲁棒性。我们的分析进一步揭示了模型在面对矛盾证据时可能存在的推理局限性。我们强调,这些攻击可能对此类模型的可检查性和人机协同使用场景产生有害影响,并在最后讨论了未来防御面临的挑战与方向。