GoodVibe: Security-by-Vibe for LLM-Based Code Generation

Large language models (LLMs) are increasingly used for code generation in fast, informal development workflows, often referred to as vibe coding, where speed and convenience are prioritized, and security requirements are rarely made explicit. In this setting, models frequently produce functionally correct but insecure code, creating a growing security risk. Existing approaches to improving code security rely on full-parameter fine-tuning or parameter-efficient adaptations, which are either costly and prone to catastrophic forgetting or operate at coarse granularity with limited interpretability and control. We present GoodVibe, a neuron-level framework for improving the security of code language models by default. GoodVibe is based on the key insight that security-relevant reasoning is localized to a small subset of neurons. We identify these neurons using gradient-based attribution from a supervised security task and perform neuron-selective fine-tuning that updates only this security-critical subspace. To further reduce training cost, we introduce activation-driven neuron clustering, enabling structured updates with minimal overhead. We evaluate GoodVibe on six LLMs across security-critical programming languages, including C++, Java, Swift, and Go. GoodVibe substantially improves the security of generated code while preserving general model utility, achieving up to a 2.5x improvement over base models, matching or exceeding full fine-tuning with over 4,700x fewer trainable parameters, and reducing training computation by more than 3.6x compared to the parameter-efficient baseline (LoRA). Our results demonstrate that neuron-level optimization offers an effective and scalable approach to securing code generation without sacrificing efficiency or generality.

翻译：大型语言模型（LLM）正日益应用于快速、非正式的代码生成工作流中，这种常被称为"氛围编程"的模式优先考虑开发速度和便利性，而安全需求往往未被明确要求。在此场景下，模型频繁生成功能正确但存在安全漏洞的代码，导致日益增长的安全风险。现有提升代码安全性的方法依赖于全参数微调或参数高效适配，这些方法要么成本高昂且易引发灾难性遗忘，要么在粗粒度上操作且可解释性与控制能力有限。本文提出GoodVibe——一种通过神经元级优化提升代码语言模型默认安全性的框架。GoodVibe基于关键发现：安全相关推理过程仅集中于少量神经元子集。我们通过监督式安全任务的梯度归因方法识别这些神经元，并实施神经元选择性微调，仅更新该安全关键子空间。为进一步降低训练成本，我们提出激活驱动的神经元聚类方法，实现结构化更新并最小化开销。我们在六种LLM上对GoodVibe进行评估，涵盖C++、Java、Swift和Go等安全关键编程语言。GoodVibe在保持模型通用能力的同时显著提升生成代码的安全性，相比基线模型最高提升2.5倍，仅使用全参数微调1/4700的可训练参数即可达到或超越其效果，较参数高效基线（LoRA）减少3.6倍以上训练计算量。实验结果表明，神经元级优化为代码生成安全提供了一种高效且可扩展的解决方案，同时不牺牲模型效率与泛化能力。