Due to the increasing computational demand of Deep Neural Networks (DNNs), companies and organizations have begun to outsource the training process. However, the externally trained DNNs can potentially be backdoor attacked. It is crucial to defend against such attacks, i.e., to postprocess a suspicious model so that its backdoor behavior is mitigated while its normal prediction power on clean inputs remain uncompromised. To remove the abnormal backdoor behavior, existing methods mostly rely on additional labeled clean samples. However, such requirement may be unrealistic as the training data are often unavailable to end users. In this paper, we investigate the possibility of circumventing such barrier. We propose a novel defense method that does not require training labels. Through a carefully designed layer-wise weight re-initialization and knowledge distillation, our method can effectively cleanse backdoor behaviors of a suspicious network with negligible compromise in its normal behavior. In experiments, we show that our method, trained without labels, is on-par with state-of-the-art defense methods trained using labels. We also observe promising defense results even on out-of-distribution data. This makes our method very practical. Code is available at: https://github.com/luluppang/BCU.
翻译:由于深度神经网络计算需求的日益增长,企业和组织开始将训练过程外包。然而,外部训练的深度神经网络可能遭受后门攻击。抵御此类攻击至关重要,即对可疑模型进行后处理,以缓解其后门行为,同时保持其对干净输入的正常预测能力不受损害。为消除异常后门行为,现有方法大多依赖额外的有标签干净样本。然而,这种要求可能不切实际,因为训练数据通常对终端用户不可用。本文研究了克服这一障碍的可能性。我们提出了一种无需训练标签的新型防御方法。通过精心设计的逐层权重重新初始化和知识蒸馏,我们的方法能够有效清除可疑网络的后门行为,同时对其正常行为的影响微乎其微。实验中,我们证明该方法在无标签训练的情况下,与使用标签训练的最先进防御方法性能相当。我们还观察到,即使在分布外数据上也能取得有前景的防御结果。这使得我们的方法非常实用。代码地址:https://github.com/luluppang/BCU。