Recent studies have demonstrated outstanding capabilities of large language models (LLMs) in software engineering domain, covering numerous tasks such as code generation and comprehension. While the benefit of LLMs for coding task is well noted, it is perceived that LLMs are vulnerable to adversarial attacks. In this paper, we study the specific LLM vulnerability to imperceptible character attacks, a type of prompt-injection attack that uses special characters to befuddle an LLM whilst keeping the attack hidden to human eyes. We devise four categories of attacks and investigate their effects on the performance outcomes of tasks relating to code analysis and code comprehension. Two generations of ChatGPT are included to evaluate the impact of advancements made to contemporary models. Our experimental design consisted of comparing perturbed and unperturbed code snippets and evaluating two performance outcomes, which are model confidence using log probabilities of response, and correctness of response. We conclude that earlier version of ChatGPT exhibits a strong negative linear correlation between the amount of perturbation and the performance outcomes, while the recent ChatGPT presents a strong negative correlation between the presence of perturbation and performance outcomes, but no valid correlational relationship between perturbation budget and performance outcomes. We anticipate this work contributes to an in-depth understanding of leveraging LLMs for coding tasks. It is suggested future research should delve into how to create LLMs that can return a correct response even if the prompt exhibits perturbations.
翻译:近期研究表明,大型语言模型(LLMs)在软件工程领域展现出卓越能力,涵盖代码生成与理解等多项任务。尽管LLMs在编码任务中的优势已得到广泛认可,但人们也认识到其易受对抗性攻击的影响。本文研究了LLMs对不可见字符攻击的特殊脆弱性——这类提示注入攻击通过特殊字符混淆LLM,同时使攻击在人类视觉中保持隐蔽。我们设计了四类攻击方法,并探究其对代码分析与代码理解相关任务性能表现的影响。研究涵盖两代ChatGPT模型,以评估当代模型演进产生的影响。实验设计通过对比受扰动与未受扰动的代码片段,评估两个性能指标:基于响应对数概率的模型置信度与响应正确性。研究结论表明:早期版本ChatGPT在扰动程度与性能表现之间呈现显著负线性相关,而新版ChatGPT在扰动存在与性能表现之间呈现显著负相关,但扰动预算与性能表现之间未发现有效相关性。我们预期本工作有助于深化对LLMs在编码任务中应用的理解,建议未来研究应深入探索如何构建即使提示存在扰动仍能返回正确响应的LLMs。