Backdoor data detection is traditionally studied in an end-to-end supervised learning (SL) setting. However, recent years have seen the proliferating adoption of self-supervised learning (SSL) and transfer learning (TL), due to their lesser need for labeled data. Successful backdoor attacks have also been demonstrated in these new settings. However, we lack a thorough understanding of the applicability of existing detection methods across a variety of learning settings. By evaluating 56 attack settings, we show that the performance of most existing detection methods varies significantly across different attacks and poison ratios, and all fail on the state-of-the-art clean-label attack. In addition, they either become inapplicable or suffer large performance losses when applied to SSL and TL. We propose a new detection method called Active Separation via Offset (ASSET), which actively induces different model behaviors between the backdoor and clean samples to promote their separation. We also provide procedures to adaptively select the number of suspicious points to remove. In the end-to-end SL setting, ASSET is superior to existing methods in terms of consistency of defensive performance across different attacks and robustness to changes in poison ratios; in particular, it is the only method that can detect the state-of-the-art clean-label attack. Moreover, ASSET's average detection rates are higher than the best existing methods in SSL and TL, respectively, by 69.3% and 33.2%, thus providing the first practical backdoor defense for these new DL settings. We open-source the project to drive further development and encourage engagement: https://github.com/ruoxi-jia-group/ASSET.
翻译:后门数据检测传统上在端到端监督学习(SL)场景中进行研究。然而,近年来,自监督学习(SSL)和迁移学习(TL)因其对标注数据需求较低而得到广泛应用。在这些新场景中,成功的后门攻击已被证实。然而,我们对现有检测方法在不同学习场景中的适用性缺乏深入理解。通过评估56种攻击设置,我们发现大多数现有检测方法在不同攻击和投毒比率下性能差异显著,并且均无法应对最先进的干净标签攻击。此外,当应用于SSL和TL时,这些方法要么不适用,要么遭受较大的性能损失。我们提出一种新的检测方法——基于偏移的主动分离(ASSET),该方法主动诱导后门样本与干净样本之间产生不同模型行为,以促进其分离。我们还提供了自适应选择待移除可疑点数量的流程。在端到端SL场景中,ASSET在跨不同攻击的防御性能一致性和对投毒比率变化的鲁棒性方面优于现有方法;特别是,它是唯一能检测最先进干净标签攻击的方法。此外,ASSET在SSL和TL场景中的平均检测率分别比现有最佳方法高出69.3%和33.2%,从而为这些新的深度学习场景提供了首个实用的后门防御方案。我们已开源项目以推动进一步开发并鼓励参与:https://github.com/ruoxi-jia-group/ASSET。