Large language models (LMs) are increasingly pretrained on massive codebases and used to generate code. However, LMs lack awareness of security and are found to frequently produce unsafe code. This work studies the security of LMs along two important axes: (i) security hardening, which aims to enhance LMs' reliability in generating secure code, and (ii) adversarial testing, which seeks to evaluate LMs' security at an adversarial standpoint. We address both of these by formulating a new security task called controlled code generation. The task is parametric and takes as input a binary property to guide the LM to generate secure or unsafe code, while preserving the LM's capability of generating functionally correct code. We propose a novel learning-based approach called SVEN to solve this task. SVEN leverages property-specific continuous vectors to guide program generation towards the given property, without modifying the LM's weights. Our training procedure optimizes these continuous vectors by enforcing specialized loss terms on different regions of code, using a high-quality dataset carefully curated by us. Our extensive evaluation shows that SVEN is highly effective in achieving strong security control. For instance, a state-of-the-art CodeGen LM with 2.7B parameters generates secure code for 59.1% of the time. When we employ SVEN to perform security hardening (or adversarial testing) on this LM, the ratio is significantly boosted to 92.3% (or degraded to 36.8%). Importantly, SVEN closely matches the original LMs in functional correctness.
翻译:大型语言模型(LMs)正越来越多地在海量代码库上进行预训练并用于生成代码。然而,这些模型缺乏安全意识,经常生成不安全的代码。本文从两个重要维度研究语言模型的安全性:(i)安全加固,旨在提升语言模型生成安全代码的可靠性;(ii)对抗测试,旨在从对抗角度评估语言模型的安全性。我们通过提出一项名为“受控代码生成”的新安全任务来同时解决这两个问题。该任务是参数化的,以二值属性作为输入,引导语言模型生成安全或不安全的代码,同时保持语言模型生成功能正确代码的能力。我们提出了一种名为SVEN的新型基于学习的方法来解决该任务。SVEN利用特定属性的连续向量引导程序生成朝向给定属性,而无需修改语言模型的权重。我们的训练过程通过在我们精心策划的高质量数据集上对不同代码区域施加专用损失项来优化这些连续向量。广泛评估表明,SVEN在实现强安全控制方面非常有效。例如,一个拥有27亿参数的最先进的CodeGen语言模型在59.1%的情况下生成安全代码。当我们采用SVEN对该模型进行安全加固(或对抗测试)时,该比率显著提升至92.3%(或下降至36.8%)。重要的是,SVEN在功能正确性上与原始语言模型高度匹配。