Modern large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities for coding tasks including writing and reasoning about code. They improve upon previous neural network models of code, such as code2seq or seq2seq, that already demonstrated competitive results when performing tasks such as code summarization and identifying code vulnerabilities. However, these previous code models were shown vulnerable to adversarial examples, i.e. small syntactic perturbations that do not change the program's semantics, such as the inclusion of "dead code" through false conditions or the addition of inconsequential print statements, designed to "fool" the models. LLMs can also be vulnerable to the same adversarial perturbations but a detailed study on this concern has been lacking so far. In this paper we aim to investigate the effect of adversarial perturbations on coding tasks with LLMs. In particular, we study the transferability of adversarial examples, generated through white-box attacks on smaller code models, to LLMs. Furthermore, to make the LLMs more robust against such adversaries without incurring the cost of retraining, we propose prompt-based defenses that involve modifying the prompt to include additional information such as examples of adversarially perturbed code and explicit instructions for reversing adversarial perturbations. Our experiments show that adversarial examples obtained with a smaller code model are indeed transferable, weakening the LLMs' performance. The proposed defenses show promise in improving the model's resilience, paving the way to more robust defensive solutions for LLMs in code-related applications.
翻译:现代大型语言模型(LLMs),如ChatGPT,在编码任务(包括代码编写与推理)中展现出卓越能力。它们改进了此前已在代码摘要和漏洞识别等任务中表现优异的神经代码模型(如code2seq或seq2seq)。然而,先前这些代码模型已被证实易受对抗样本攻击——即通过虚假条件插入“死代码”或添加无关紧要的打印语句等不改变程序语义的微小句法扰动,旨在“欺骗”模型。LLMs同样可能面临此类对抗性扰动的威胁,但至今缺乏系统研究。本文旨在探究对抗性扰动对LLMs编码任务的影响,重点考察通过白盒攻击小型代码模型生成的对抗样本向LLMs的迁移特性。此外,为在不增加重训练成本的前提下提升LLMs的鲁棒性,我们提出基于提示的防御策略,通过修改提示信息以纳入对抗扰动代码示例及显式逆向扰动指令等额外内容。实验表明,基于小型代码模型获取的对抗样本确实具有迁移性,能够削弱LLMs的性能。所提出的防御策略在提升模型韧性方面展现出潜力,为代码相关应用中LLMs的更鲁棒防御方案奠定了基础。