Large Language Models (LLMs) are gaining momentum in software development with prompt-driven programming enabling developers to create code from natural language (NL) instructions. However, studies have questioned their ability to produce secure code and, thereby, the quality of prompt-generated software. Alongside, various prompting techniques that carefully tailor prompts have emerged to elicit optimal responses from LLMs. Still, the interplay between such prompting strategies and secure code generation remains under-explored and calls for further investigations. OBJECTIVE: In this study, we investigate the impact of different prompting techniques on the security of code generated from NL instructions by LLMs. METHOD: First we perform a systematic literature review to identify the existing prompting techniques that can be used for code generation tasks. A subset of these techniques are evaluated on GPT-3, GPT-3.5, and GPT-4 models for secure code generation. For this, we used an existing dataset consisting of 150 NL security-relevant code-generation prompts. RESULTS: Our work (i) classifies potential prompting techniques for code generation (ii) adapts and evaluates a subset of the identified techniques for secure code generation tasks and (iii) observes a reduction in security weaknesses across the tested LLMs, especially after using an existing technique called Recursive Criticism and Improvement (RCI), contributing valuable insights to the ongoing discourse on LLM-generated code security.
翻译:大型语言模型(LLM)在软件开发中日益受到重视,通过提示驱动的编程使开发者能够从自然语言(NL)指令生成代码。然而,研究对其生成安全代码的能力以及由此产生的提示生成软件质量提出了质疑。与此同时,各种精心设计提示的提示技术不断涌现,以激发LLM的最佳响应。尽管如此,此类提示策略与安全代码生成之间的相互作用仍未得到充分探索,需要进一步研究。目标:本研究旨在探究不同提示技术对LLM从自然语言指令生成代码安全性的影响。方法:首先,我们进行了系统性文献综述,以识别可用于代码生成任务的现有提示技术。从中选取一个子集,在GPT-3、GPT-3.5和GPT-4模型上评估其安全代码生成能力。为此,我们使用了一个包含150个自然语言安全相关代码生成提示的现有数据集。结果:我们的工作(i)对适用于代码生成的潜在提示技术进行了分类;(ii)针对安全代码生成任务调整并评估了所识别技术的一个子集;(iii)观察到所有测试LLM的安全缺陷均有所减少,尤其是在使用一种名为递归批评与改进(RCI)的现有技术后,为LLM生成代码安全性的持续讨论提供了有价值的见解。