DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks

Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of ``early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to be bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks. To verify the effectiveness of our defense method, we select code synthesis tasks as our experimental scenarios. Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE in enhancing the security of CLMs.

翻译：代码语言模型（CLMs），尤其是那些利用深度学习的模型，已在代码智能领域取得了显著成功。然而，在此过程中，安全问题，特别是后门攻击，常常被忽视。先前的研究主要集中在为CLMs设计后门攻击，但有效的防御方法尚未得到充分解决。具体而言，现有来自自然语言处理的防御方法在直接应用于CLMs时效果不足且缺乏通用性，它们在某些模型和场景下表现良好，但在其他情况下则失效，因此在持续缓解后门攻击方面存在不足。为弥补这一差距，我们首先确认了“早期学习”现象是CLMs训练过程中的普遍现象。该现象指的是模型最初关注训练数据的主要特征，但随着时间的推移可能对后门触发器变得更加敏感，从而导致过拟合并易于受到后门攻击。随后，我们分析了对后门触发器的过拟合源于交叉熵损失函数的使用，其中交叉熵的无界性导致模型越来越集中于中毒数据的特征。基于这一洞见，我们通过混合欺骗性分布并应用标签平滑来限制梯度有界，提出了一种通用且有效的损失函数DeCE（欺骗性交叉熵），从而防止模型对后门触发器过拟合，进而增强CLMs抵御后门攻击的安全性。为验证我们防御方法的有效性，我们选择代码合成任务作为实验场景。我们在多种代码合成数据集、模型和投毒比例上进行的实验证明了DeCE在增强CLMs安全性方面的适用性和有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日