With tools like GitHub Copilot, automatic code suggestion is no longer a dream in software engineering. These tools, based on large language models, are typically trained on massive corpora of code mined from unvetted public sources. As a result, these models are susceptible to data poisoning attacks where an adversary manipulates the model's training by injecting malicious data. Poisoning attacks could be designed to influence the model's suggestions at run time for chosen contexts, such as inducing the model into suggesting insecure code payloads. To achieve this, prior attacks explicitly inject the insecure code payload into the training data, making the poison data detectable by static analysis tools that can remove such malicious data from the training set. In this work, we demonstrate two novel attacks, COVERT and TROJANPUZZLE, that can bypass static analysis by planting malicious poison data in out-of-context regions such as docstrings. Our most novel attack, TROJANPUZZLE, goes one step further in generating less suspicious poison data by never explicitly including certain (suspicious) parts of the payload in the poison data, while still inducing a model that suggests the entire payload when completing code (i.e., outside docstrings). This makes TROJANPUZZLE robust against signature-based dataset-cleansing methods that can filter out suspicious sequences from the training data. Our evaluation against models of two sizes demonstrates that both COVERT and TROJANPUZZLE have significant implications for practitioners when selecting code used to train or tune code-suggestion models.
翻译:借助GitHub Copilot等工具,自动代码建议已不再是软件工程中的梦想。这些基于大语言模型的工具通常使用从未经审查的公共来源挖掘的海量代码语料进行训练。因此,这类模型易受数据投毒攻击——攻击者通过注入恶意数据操纵模型训练过程。投毒攻击可被设计为在特定上下文场景下影响模型运行时的建议输出,例如诱使模型推荐存在安全漏洞的代码载荷。为实现这一目标,现有攻击会显式地将不安全代码载荷注入训练数据,导致中毒数据可被静态分析工具检测并从训练集中移除。本研究提出两种新型攻击方法:COVERT与TROJANPUZZLE,它们能将恶意中毒数据植入文档字符串等非上下文区域,从而绕过静态分析。其中最创新的TROJANPUZZLE攻击更进一步生成更不易引起怀疑的中毒数据:它从不将载荷中的特定(可疑)部分显式包含在中毒数据中,却依然能诱导模型在完成代码(即文档字符串以外的区域)时建议完整的载荷。这使得TROJANPUZZLE能够抵御基于签名的数据集清洗方法——这类方法通常会从训练数据中过滤可疑序列。我们对两种规模模型的评估表明,COVERT与TROJANPUZZLE对从业者选择用于训练或微调代码建议模型的代码具有重要警示意义。