LLM-based programming assistants offer the promise of programming faster but with the risk of introducing more security vulnerabilities. Prior work has studied how LLMs could be maliciously fine-tuned to suggest vulnerabilities more often. With the rise of agentic LLMs, which may use results from an untrusted third party, there is a growing risk of attacks on the model's prompt. We introduce the Malicious Programming Prompt (MaPP) attack, in which an attacker adds a small amount of text to a prompt for a programming task (under 500 bytes). We show that our prompt strategy can cause an LLM to add vulnerabilities while continuing to write otherwise correct code. We evaluate three prompts on seven common LLMs, from basic to state-of-the-art commercial models. Using the HumanEval benchmark, we find that our prompts are broadly effective, with no customization required for different LLMs. Furthermore, the LLMs that are best at HumanEval are also best at following our malicious instructions, suggesting that simply scaling language models will not prevent MaPP attacks. Using a dataset of eight CWEs in 16 scenarios, we find that MaPP attacks are also effective at implementing specific and targeted vulnerabilities across a range of models. Our work highlights the need to secure LLM prompts against manipulation as well as rigorously auditing code generated with the help of LLMs.
翻译:基于大型语言模型(LLM)的编程助手在提升编程效率的同时,也带来了引入更多安全漏洞的风险。先前的研究已探讨如何通过恶意微调LLM以更频繁地生成漏洞建议。随着具备代理能力的LLM兴起,这些模型可能使用来自不可信第三方的结果,导致针对模型提示的攻击风险日益增加。本文提出恶意编程提示(MaPP)攻击,攻击者通过在编程任务提示中添加少量文本(小于500字节)实现攻击。我们证明,该提示策略能够诱导LLM在保持代码功能正确性的同时植入安全漏洞。我们在七种常见LLM(从基础模型到最先进的商业模型)上评估了三种提示策略。基于HumanEval基准测试,我们发现这些提示具有广泛的适用性,无需针对不同LLM进行定制化调整。此外,在HumanEval测试中表现最优的LLM也最易遵循恶意指令,这表明仅通过扩展语言模型规模无法防御MaPP攻击。通过包含16种场景中八类常见缺陷枚举(CWE)的数据集验证,我们发现MaPP攻击能够针对多种模型有效植入特定目标漏洞。本研究强调,在利用LLM生成代码时,不仅需要加强提示安全防护机制,还必须对生成的代码进行严格审计。