The vulnerabilities to backdoor attacks have recently threatened the trustworthiness of machine learning models in practical applications. Conventional wisdom suggests that not everyone can be an attacker since the process of designing the trigger generation algorithm often involves significant effort and extensive experimentation to ensure the attack's stealthiness and effectiveness. Alternatively, this paper shows that there exists a more severe backdoor threat: anyone can exploit an easily-accessible algorithm for silent backdoor attacks. Specifically, this attacker can employ the widely-used lossy image compression from a plethora of compression tools to effortlessly inject a trigger pattern into an image without leaving any noticeable trace; i.e., the generated triggers are natural artifacts. One does not require extensive knowledge to click on the "convert" or "save as" button while using tools for lossy image compression. Via this attack, the adversary does not need to design a trigger generator as seen in prior works and only requires poisoning the data. Empirically, the proposed attack consistently achieves 100% attack success rate in several benchmark datasets such as MNIST, CIFAR-10, GTSRB and CelebA. More significantly, the proposed attack can still achieve almost 100% attack success rate with very small (approximately 10%) poisoning rates in the clean label setting. The generated trigger of the proposed attack using one lossy compression algorithm is also transferable across other related compression algorithms, exacerbating the severity of this backdoor threat. This work takes another crucial step toward understanding the extensive risks of backdoor attacks in practice, urging practitioners to investigate similar attacks and relevant backdoor mitigation methods.
翻译:后门攻击的脆弱性近期威胁着机器学习模型在实际应用中的可信赖性。传统观点认为,并非人人都能成为攻击者,因为设计触发器生成算法的过程通常需要大量工作和广泛实验以确保攻击的隐蔽性和有效性。然而,本文证明存在一种更严重的后门威胁:任何人都可以利用易于获取的算法实现无声后门攻击。具体而言,该攻击者可采用来自众多压缩工具的广泛使用的有损图像压缩,无需费力即可将触发器模式注入图像而不留下任何明显痕迹,即生成的触发器是天然伪影。用户无需专业知识,只需在使用有损图像压缩工具时点击"转换"或"另存为"按钮即可。通过此攻击,攻击者无需像以往工作那样设计触发器生成器,只需中毒数据。实验证明,所提出的攻击在MNIST、CIFAR-10、GTSRB和CelebA等多个基准数据集上始终实现100%的攻击成功率。更重要的是,在干净标签设置下,即使使用极低(约10%)的中毒率,所提出攻击仍能实现近乎100%的攻击成功率。使用一种有损压缩算法生成的触发器也可迁移至其他相关压缩算法,进一步加剧了这种后门威胁的严重性。这项工作为理解后门攻击在实际中的广泛风险迈出了关键一步,敦促从业者研究类似攻击及相关后门防御方法。