Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of ``early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to be bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks. To verify the effectiveness of our defense method, we select code synthesis tasks as our experimental scenarios. Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE in enhancing the security of CLMs.
翻译:代码语言模型(CLMs),尤其是那些利用深度学习的模型,已在代码智能领域取得了显著成功。然而,在此过程中,安全问题,特别是后门攻击,常常被忽视。先前的研究主要集中在为CLMs设计后门攻击,但有效的防御方法尚未得到充分解决。具体而言,现有来自自然语言处理的防御方法在直接应用于CLMs时效果不足且缺乏通用性,它们在某些模型和场景下表现良好,但在其他情况下则失效,因此在持续缓解后门攻击方面存在不足。为弥补这一差距,我们首先确认了“早期学习”现象是CLMs训练过程中的普遍现象。该现象指的是模型最初关注训练数据的主要特征,但随着时间的推移可能对后门触发器变得更加敏感,从而导致过拟合并易于受到后门攻击。随后,我们分析了对后门触发器的过拟合源于交叉熵损失函数的使用,其中交叉熵的无界性导致模型越来越集中于中毒数据的特征。基于这一洞见,我们通过混合欺骗性分布并应用标签平滑来限制梯度有界,提出了一种通用且有效的损失函数DeCE(欺骗性交叉熵),从而防止模型对后门触发器过拟合,进而增强CLMs抵御后门攻击的安全性。为验证我们防御方法的有效性,我们选择代码合成任务作为实验场景。我们在多种代码合成数据集、模型和投毒比例上进行的实验证明了DeCE在增强CLMs安全性方面的适用性和有效性。