Large Language Models (LLMs) for code generation can replicate insecure patterns from their training data. To mitigate this, a common strategy for security hardening is to fine-tune models using supervision derived from the final transformer layer. However, this design may suffer from a final-layer bottleneck: vulnerability-discriminative cues can be distributed across layers and become less detectable near the output representations optimized for next-token prediction. To diagnose this issue, we perform layer-wise linear probing. We observe that vulnerability-related signals are most detectable in a band of intermediate-to-upper layers yet attenuate toward the final layers. Motivated by this observation, we introduce DeepGuard, a framework that leverages distributed security-relevant cues by aggregating representations from multiple upper layers via an attention-based module. The aggregated signal powers a dedicated security analyzer within a multi-objective training objective that balances security enhancement and functional correctness, and further supports a lightweight inference-time steering strategy. Extensive experiments across five code LLMs demonstrate that DeepGuard improves the secure-and-correct generation rate by an average of 11.9% over strong baselines such as SVEN. It also preserves functional correctness while exhibiting generalization to held-out vulnerability types. Our code is public at https://github.com/unknownhl/DeepGuard.
翻译:大型语言模型(LLM)在代码生成任务中可能复现其训练数据中的不安全模式。为缓解此问题,一种常见的安全加固策略是利用最终Transformer层输出的监督信号对模型进行微调。然而,这种设计可能面临最终层瓶颈问题:漏洞判别性线索会分散在多个网络层中,当接近优化用于下一个词预测的输出表示时,这些线索变得难以检测。为诊断该问题,我们进行了逐层线性探测,观察到漏洞相关信号在中间层至高层区域最为显著,但在最终层附近逐渐衰减。基于此发现,我们提出DeepGuard框架,该框架通过注意力机制模块聚合多个高层的表示,从而利用分布式安全相关线索。聚合后的信号驱动一个专用安全分析器,在平衡安全增强与功能正确性的多目标训练框架中发挥作用,并进一步支持轻量级推理时引导策略。在五个代码LLM上的广泛实验表明,DeepGuard相较于SVEN等强基线方法,将安全且正确的代码生成率平均提升11.9%。该方法在保持功能正确性的同时,还能泛化至未见漏洞类型。我们的代码已开源:https://github.com/unknownhl/DeepGuard。