The application of large language models to code generation has evolved from one-shot generation to iterative refinement, yet the evolution of security throughout iteration remains insufficiently understood. Through comparative experiments on three mainstream LLMs, this paper reveals the iterative refinement paradox: specification drift during multi-objective optimization causes security to degrade gradually over successive iterations. Taking GPT-4o as an example, 43.7 % of iteration chains contain more vulnerabilities than the baseline after ten rounds, and cross-model experiments show that this phenomenon is prevalent. Further analysis shows that simply introducing static application security testing (SAST) gating cannot effectively suppress degradation; instead, it increases the latent security degradation rate from 12.5% under the unprotected baseline to 20.8 %. The root cause is that static-analysis rules cannot cover structural degradations such as the removal of defensive logic or the weakening of exception handling. To address this problem, we propose the SCAFFOLD-CEGIS framework. Drawing on the counterexample-guided inductive synthesis (CEGIS) paradigm, the framework adopts a multi-agent collaborative architecture that transforms security constraints from implicit prompts into explicit verifiable constraints. It automatically identifies and solidifies security-critical elements as hard constraints through semantic anchoring, enforces safety monotonicity through four-layer gated verification, and continuously assimilates experience from failures. Comparative experiments against six existing defense methods show that the full framework reduces the latent security degradation rate to 2.1% and achieves a safety monotonicity rate of 100%.
翻译:将大语言模型应用于代码生成已从单次生成演变为迭代式精炼,然而安全性在整个迭代过程中的演变仍未得到充分理解。本文通过对三种主流LLM的比较实验,揭示了迭代精炼悖论:多目标优化过程中的规范漂移导致安全性在连续迭代中逐渐退化。以GPT-4o为例,43.7%的迭代链在十轮后比基线包含更多漏洞,跨模型实验表明该现象普遍存在。进一步分析表明,简单地引入静态应用安全测试(SAST)门控无法有效抑制退化,反而使潜在安全性退化率从无保护基线下的12.5%上升至20.8%。其根本原因在于静态分析规则无法覆盖诸如防御逻辑移除或异常处理弱化等结构性退化。为解决此问题,我们提出了SCAFFOLD-CEGIS框架。借鉴反例引导归纳合成(CEGIS)范式,该框架采用多智能体协同架构,将安全约束从隐式提示转化为显式可验证约束。它通过语义锚定自动识别并固化安全关键元素作为硬约束,通过四层门控验证强制执行安全性单调性,并持续从失败中吸收经验。针对六种现有防御方法的对比实验表明,完整框架将潜在安全性退化率降低至2.1%,并实现了100%的安全性单调率。