TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs

Recently, code-oriented large language models (Code LLMs) have been widely and successfully used to simplify and facilitate code programming. With these tools, developers can easily generate desired complete functional codes based on incomplete code and natural language prompts. However, a few pioneering works revealed that these Code LLMs are also vulnerable, e.g., against backdoor and adversarial attacks. The former could induce LLMs to respond to triggers to insert malicious code snippets by poisoning the training data or model parameters, while the latter can craft malicious adversarial input codes to reduce the quality of generated codes. However, both attack methods have underlying limitations: backdoor attacks rely on controlling the model training process, while adversarial attacks struggle with fulfilling specific malicious purposes. To inherit the advantages of both backdoor and adversarial attacks, this paper proposes a new attack paradigm, i.e., target-specific and adversarial prompt injection (TAPI), against Code LLMs. TAPI generates unreadable comments containing information about malicious instructions and hides them as triggers in the external source code. When users exploit Code LLMs to complete codes containing the trigger, the models will generate attacker-specified malicious code snippets at specific locations. We evaluate our TAPI attack on four representative LLMs under three representative malicious objectives and seven cases. The results show that our method is highly threatening (achieving an attack success rate of up to 89.3\%) and stealthy (saving an average of 53.1\% of tokens in the trigger design). In particular, we successfully attack some famous deployed code completion integrated applications, including CodeGeex and Github Copilot. This further confirms the realistic threat of our attack.

翻译：近年来，面向代码的大语言模型（Code LLMs）已被广泛且成功地用于简化和促进代码编程。借助这些工具，开发者能够基于不完整的代码和自然语言提示轻松生成所需的功能完整代码。然而，少数开创性工作揭示了这些代码大语言模型同样存在脆弱性，例如易受后门攻击和对抗性攻击。前者可通过污染训练数据或模型参数，诱导大语言模型响应触发器以插入恶意代码片段；后者则可精心构造恶意的对抗性输入代码，以降低生成代码的质量。然而，这两种攻击方法均存在潜在局限性：后门攻击依赖于控制模型训练过程，而对抗性攻击则难以实现特定的恶意目的。为了继承后门攻击与对抗性攻击两者的优势，本文提出了一种针对代码大语言模型的新型攻击范式，即目标特定对抗性提示注入攻击（TAPI）。TAPI生成包含恶意指令信息的不可读注释，并将其作为触发器隐藏于外部源代码中。当用户利用代码大语言模型补全包含该触发器的代码时，模型将在特定位置生成攻击者指定的恶意代码片段。我们在三种代表性恶意目标和七种场景下，对四种代表性大语言模型评估了TAPI攻击。结果表明，我们的方法具有高度威胁性（攻击成功率最高可达89.3%）和隐蔽性（在触发器设计中平均节省53.1%的token）。特别地，我们成功攻击了部分已部署的知名代码补全集成应用，包括CodeGeex和Github Copilot。这进一步证实了我们攻击的现实威胁性。