TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

With tools like GitHub Copilot, automatic code suggestion is no longer a dream in software engineering. These tools, based on large language models, are typically trained on massive corpora of code mined from unvetted public sources. As a result, these models are susceptible to data poisoning attacks where an adversary manipulates the model's training by injecting malicious data. Poisoning attacks could be designed to influence the model's suggestions at run time for chosen contexts, such as inducing the model into suggesting insecure code payloads. To achieve this, prior attacks explicitly inject the insecure code payload into the training data, making the poison data detectable by static analysis tools that can remove such malicious data from the training set. In this work, we demonstrate two novel attacks, COVERT and TROJANPUZZLE, that can bypass static analysis by planting malicious poison data in out-of-context regions such as docstrings. Our most novel attack, TROJANPUZZLE, goes one step further in generating less suspicious poison data by never explicitly including certain (suspicious) parts of the payload in the poison data, while still inducing a model that suggests the entire payload when completing code (i.e., outside docstrings). This makes TROJANPUZZLE robust against signature-based dataset-cleansing methods that can filter out suspicious sequences from the training data. Our evaluation against models of two sizes demonstrates that both COVERT and TROJANPUZZLE have significant implications for practitioners when selecting code used to train or tune code-suggestion models.

翻译：借助GitHub Copilot等工具，自动代码建议已不再是软件工程中的梦想。这些基于大语言模型的工具通常使用从未经审查的公共来源挖掘的海量代码语料进行训练。因此，这类模型易受数据投毒攻击——攻击者通过注入恶意数据操纵模型训练过程。投毒攻击可被设计为在特定上下文场景下影响模型运行时的建议输出，例如诱使模型推荐存在安全漏洞的代码载荷。为实现这一目标，现有攻击会显式地将不安全代码载荷注入训练数据，导致中毒数据可被静态分析工具检测并从训练集中移除。本研究提出两种新型攻击方法：COVERT与TROJANPUZZLE，它们能将恶意中毒数据植入文档字符串等非上下文区域，从而绕过静态分析。其中最创新的TROJANPUZZLE攻击更进一步生成更不易引起怀疑的中毒数据：它从不将载荷中的特定（可疑）部分显式包含在中毒数据中，却依然能诱导模型在完成代码（即文档字符串以外的区域）时建议完整的载荷。这使得TROJANPUZZLE能够抵御基于签名的数据集清洗方法——这类方法通常会从训练数据中过滤可疑序列。我们对两种规模模型的评估表明，COVERT与TROJANPUZZLE对从业者选择用于训练或微调代码建议模型的代码具有重要警示意义。

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日