Large language models (large LMs) are increasingly trained on massive codebases and used to generate code. However, LMs lack awareness of security and are found to frequently produce unsafe code. This work studies the security of LMs along two important axes: (i) security hardening, which aims to enhance LMs' reliability in generating secure code, and (ii) adversarial testing, which seeks to evaluate LMs' security at an adversarial standpoint. We address both of these by formulating a new security task called controlled code generation. The task is parametric and takes as input a binary property to guide the LM to generate secure or unsafe code, while preserving the LM's capability of generating functionally correct code. We propose a novel learning-based approach called SVEN to solve this task. SVEN leverages property-specific continuous vectors to guide program generation towards the given property, without modifying the LM's weights. Our training procedure optimizes these continuous vectors by enforcing specialized loss terms on different regions of code, using a high-quality dataset carefully curated by us. Our extensive evaluation shows that SVEN is highly effective in achieving strong security control. For instance, a state-of-the-art CodeGen LM with 2.7B parameters generates secure code for 59.1% of the time. When we employ SVEN to perform security hardening (or adversarial testing) on this LM, the ratio is significantly boosted to 92.3% (or degraded to 36.8%). Importantly, SVEN closely matches the original LMs in functional correctness.
翻译:大型语言模型正日益在庞大的代码库上进行训练,并用于生成代码。然而,语言模型缺乏对安全性的认知,且经常被发现生成不安全的代码。本研究从两个重要维度对语言模型的安全性进行了探讨:(i)安全加固,旨在增强语言模型生成安全代码的可靠性;(ii)对抗性测试,旨在从对抗性角度评估语言模型的安全性。我们通过制定一项名为“受控代码生成”的新安全任务来同时解决这两个问题。该任务具有参数化特性,接收一个二进制属性作为输入,以引导语言模型生成安全或不安全的代码,同时保留语言模型生成功能正确代码的能力。我们提出了一种名为SVEN的新型学习方法来解决此任务。SVEN利用特定属性的连续向量来引导程序生成朝向给定属性,而无需修改语言模型的权重。我们的训练过程通过在不同代码区域施加专门的损失项来优化这些连续向量,并使用我们精心策划的高质量数据集。我们的广泛评估表明,SVEN在实现强大的安全控制方面非常有效。例如,一个拥有27亿参数的最先进CodeGen语言模型在59.1%的情况下生成安全代码。当我们使用SVEN对该模型进行安全加固(或对抗性测试)时,该比例显著提升至92.3%(或下降至36.8%)。重要的是,SVEN在功能正确性方面与原始语言模型非常接近。