Large Language Models for Code: Security Hardening and Adversarial Testing

Large language models (large LMs) are increasingly trained on massive codebases and used to generate code. However, LMs lack awareness of security and are found to frequently produce unsafe code. This work studies the security of LMs along two important axes: (i) security hardening, which aims to enhance LMs' reliability in generating secure code, and (ii) adversarial testing, which seeks to evaluate LMs' security at an adversarial standpoint. We address both of these by formulating a new security task called controlled code generation. The task is parametric and takes as input a binary property to guide the LM to generate secure or unsafe code, while preserving the LM's capability of generating functionally correct code. We propose a novel learning-based approach called SVEN to solve this task. SVEN leverages property-specific continuous vectors to guide program generation towards the given property, without modifying the LM's weights. Our training procedure optimizes these continuous vectors by enforcing specialized loss terms on different regions of code, using a high-quality dataset carefully curated by us. Our extensive evaluation shows that SVEN is highly effective in achieving strong security control. For instance, a state-of-the-art CodeGen LM with 2.7B parameters generates secure code for 59.1% of the time. When we employ SVEN to perform security hardening (or adversarial testing) on this LM, the ratio is significantly boosted to 92.3% (or degraded to 36.8%). Importantly, SVEN closely matches the original LMs in functional correctness.

翻译：大型语言模型（Large LMs）越来越多地基于海量代码库进行训练，并用于生成代码。然而，这些模型缺乏安全意识，经常被发现生成不安全的代码。本研究从两个重要维度探讨大型语言模型的安全性：（i）安全强化，旨在提升模型生成安全代码的可靠性；（ii）对抗测试，旨在从对抗性角度评估模型的安全性。我们通过提出一种名为“受控代码生成”的新安全任务来同时解决这两个问题。该任务是参数化的，以二元属性作为输入，引导模型生成安全或不安全的代码，同时保持模型生成功能正确代码的能力。我们提出了一种名为SVEN的新型基于学习的方法来解决此任务。SVEN利用特定属性的连续向量来引导程序生成朝向给定属性，而无需修改模型权重。我们的训练过程通过在我们精心构建的高质量数据集上，对代码的不同区域施加专门的损失项，来优化这些连续向量。广泛的评估表明，SVEN在实现强安全控制方面非常有效。例如，一个拥有27亿参数的最先进CodeGen模型生成安全代码的比例为59.1%。当我们使用SVEN对该模型进行安全强化（或对抗测试）时，该比例显著提升至92.3%（或降低至36.8%）。重要的是，SVEN在功能正确性方面与原模型高度接近。