Everyone Can Attack: Repurpose Lossy Compression as a Natural Backdoor Attack

The vulnerabilities to backdoor attacks have recently threatened the trustworthiness of machine learning models in practical applications. Conventional wisdom suggests that not everyone can be an attacker since the process of designing the trigger generation algorithm often involves significant effort and extensive experimentation to ensure the attack's stealthiness and effectiveness. Alternatively, this paper shows that there exists a more severe backdoor threat: anyone can exploit an easily-accessible algorithm for silent backdoor attacks. Specifically, this attacker can employ the widely-used lossy image compression from a plethora of compression tools to effortlessly inject a trigger pattern into an image without leaving any noticeable trace; i.e., the generated triggers are natural artifacts. One does not require extensive knowledge to click on the "convert" or "save as" button while using tools for lossy image compression. Via this attack, the adversary does not need to design a trigger generator as seen in prior works and only requires poisoning the data. Empirically, the proposed attack consistently achieves 100% attack success rate in several benchmark datasets such as MNIST, CIFAR-10, GTSRB and CelebA. More significantly, the proposed attack can still achieve almost 100% attack success rate with very small (approximately 10%) poisoning rates in the clean label setting. The generated trigger of the proposed attack using one lossy compression algorithm is also transferable across other related compression algorithms, exacerbating the severity of this backdoor threat. This work takes another crucial step toward understanding the extensive risks of backdoor attacks in practice, urging practitioners to investigate similar attacks and relevant backdoor mitigation methods.

翻译：后门攻击的脆弱性近来威胁着机器学习模型在实际应用中的可信度。传统观点认为并非每个人都能成为攻击者，因为设计触发器生成算法的过程往往需要大量努力和广泛实验，以确保攻击的隐蔽性和有效性。然而，本文表明存在一种更严重的后门威胁：任何人都可以利用一种易于获取的算法进行隐蔽后门攻击。具体而言，攻击者可以使用众多压缩工具中广泛使用的有损图像压缩，在不留下任何明显痕迹的情况下轻松将触发器模式注入图像；即生成的触发器是自然伪影。使用有损图像压缩工具时，用户无需具备丰富知识即可点击“转换”或“另存为”按钮。通过这种攻击，对手无需像以往研究中那样设计触发器生成器，只需对数据进行投毒即可。实验表明，在MNIST、CIFAR-10、GTSRB和CelebA等多个基准数据集上，所提出的攻击始终能实现100%的攻击成功率。更重要的是，在干净标签设置下，即使投毒率非常低（约10%），所提出的攻击仍能实现接近100%的攻击成功率。使用一种有损压缩算法生成的触发器还可跨其他相关压缩算法迁移，加剧了这种后门威胁的严重性。这项工作在理解实际应用中后门攻击的广泛风险方面迈出了关键一步，敦促从业者研究类似攻击及相关的后门防御方法。

相关内容