Due to the increasing computational demand of Deep Neural Networks (DNNs), companies and organizations have begun to outsource the training process. However, the externally trained DNNs can potentially be backdoor attacked. It is crucial to defend against such attacks, i.e., to postprocess a suspicious model so that its backdoor behavior is mitigated while its normal prediction power on clean inputs remain uncompromised. To remove the abnormal backdoor behavior, existing methods mostly rely on additional labeled clean samples. However, such requirement may be unrealistic as the training data are often unavailable to end users. In this paper, we investigate the possibility of circumventing such barrier. We propose a novel defense method that does not require training labels. Through a carefully designed layer-wise weight re-initialization and knowledge distillation, our method can effectively cleanse backdoor behaviors of a suspicious network with negligible compromise in its normal behavior. In experiments, we show that our method, trained without labels, is on-par with state-of-the-art defense methods trained using labels. We also observe promising defense results even on out-of-distribution data. This makes our method very practical. Code is available at: https://github.com/luluppang/BCU.
翻译:由于深度神经网络(DNNs)计算需求的不断增加,公司和组织开始将训练过程外包。然而,外部训练的DNNs可能遭受后门攻击。防御此类攻击至关重要,即对可疑模型进行后处理,使其后门行为得到缓解,同时保持其在干净输入上的正常预测能力不受损害。为移除异常的后门行为,现有方法大多依赖额外的有标签干净样本。然而,这种要求可能不现实,因为终端用户通常无法获取训练数据。本文研究了规避这一障碍的可能性。我们提出了一种不需要训练标签的新型防御方法。通过精心设计的逐层权重重新初始化和知识蒸馏,我们的方法能够有效清除可疑网络的后门行为,且对其正常行为的影响微乎其微。实验表明,我们的方法无需标签即可与使用标签训练的最先进防御方法性能相当。我们还观察到即使在分布外数据上也能取得有希望的防御结果。这使得我们的方法非常实用。代码地址:https://github.com/luluppang/BCU。